CesnetDatabase
cesnet_tszoo.datasets.cesnet_database.CesnetDatabase
Bases: ABC
Base class for cesnet databases. This class should not be used directly. Use it as base for adding new databases.
Derived databases are used by calling class method get_dataset
which will create a new dataset instance of SeriesBasedCesnetDataset
or TimeBasedCesnetDataset
. Check them for more info about how to use them.
Intended usage:
When using TimeBasedCesnetDataset
(dataset_type
= DatasetType.TIME_BASED
):
- Create an instance of the dataset with the desired data root by calling
get_dataset
. This will download the dataset if it has not been previously downloaded and return instance of dataset. - Create an instance of
TimeBasedConfig
and set it usingset_dataset_config_and_initialize
. This initializes the dataset, including data splitting (train/validation/test), fitting transformers (if needed), selecting features, and more. This is cached for later use. - Use
get_train_dataloader
/get_train_df
/get_train_numpy
to get training data for chosen model. - Validate the model and perform the hyperparameter optimalization on
get_val_dataloader
/get_val_df
/get_val_numpy
. - Evaluate the model on
get_test_dataloader
/get_test_df
/get_test_numpy
.
When using SeriesBasedCesnetDataset
(dataset_type
= DatasetType.SERIES_BASED
):
- Create an instance of the dataset with the desired data root by calling
get_dataset
. This will download the dataset if it has not been previously downloaded and return instance of dataset. - Create an instance of
SeriesBasedConfig
and set it usingset_dataset_config_and_initialize
. This initializes the dataset, including data splitting (train/validation/test), fitting transformers (if needed), selecting features, and more. This is cached for later use. - Use
get_train_dataloader
/get_train_df
/get_train_numpy
to get training data for chosen model. - Validate the model and perform the hyperparameter optimalization on
get_val_dataloader
/get_val_df
/get_val_numpy
. - Evaluate the model on
get_test_dataloader
/get_test_df
/get_test_numpy
.
When using DisjointTimeBasedCesnetDataset
(dataset_type
= DatasetType.DISJOINT_TIME_BASED
):
- Create an instance of the dataset with the desired data root by calling
get_dataset
. This will download the dataset if it has not been previously downloaded and return instance of dataset. - Create an instance of
DisjointTimeBasedConfig
and set it usingset_dataset_config_and_initialize
. This initializes the dataset, including data splitting (train/validation/test), fitting transformers (if needed), selecting features, and more. This is cached for later use. - Use
get_train_dataloader
/get_train_df
/get_train_numpy
to get training data for chosen model. - Validate the model and perform the hyperparameter optimalization on
get_val_dataloader
/get_val_df
/get_val_numpy
. - Evaluate the model on
get_test_dataloader
/get_test_df
/get_test_numpy
.
Used class attributes:
Attributes:
Name | Type | Description |
---|---|---|
name |
str
|
Name of the database. |
bucket_url |
str
|
URL of the bucket where the dataset is stored. |
tszoo_root |
str
|
Path to folder where all databases are saved. Set after |
database_root |
str
|
Path to the folder where datasets belonging to the database are saved. Set after |
configs_root |
str
|
Path to the folder where configurations are saved. Set after |
benchmarks_root |
str
|
Path to the folder where benchmarks are saved. Set after |
annotations_root |
str
|
Path to the folder where annotations are saved. Set after |
id_names |
dict
|
Names for time series IDs for each |
default_values |
dict
|
Default values for each available feature. |
source_types |
list[SourceType]
|
Available source types for the database. |
aggregations |
list[AgreggationType]
|
Available aggregations for the database. |
additional_data |
dict[str, tuple]
|
Available small datasets for each dataset. |
Source code in cesnet_tszoo\datasets\cesnet_database.py
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 |
|
get_dataset
classmethod
get_dataset(data_root: str, source_type: SourceType | str, aggregation: AgreggationType | str, dataset_type: DatasetType | str, check_errors: bool = False, display_details: bool = False) -> TimeBasedCesnetDataset | SeriesBasedCesnetDataset
Create new dataset instance.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data_root
|
str
|
Path to the folder where the dataset will be stored. Each database has its own subfolder |
required |
source_type
|
SourceType | str
|
The source type of the desired dataset. |
required |
aggregation
|
AgreggationType | str
|
The aggregation type for the selected source type. |
required |
dataset_type
|
DatasetType | str
|
Type of a dataset you want to create. Can be |
required |
check_errors
|
bool
|
Whether to validate if the dataset is corrupted. |
False
|
display_details
|
bool
|
Whether to display details about the available data in chosen dataset. |
False
|
Returns:
Type | Description |
---|---|
TimeBasedCesnetDataset | SeriesBasedCesnetDataset
|
Source code in cesnet_tszoo\datasets\cesnet_database.py
83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 |
|
get_expected_paths
classmethod
get_expected_paths(data_root: str, database_name: str) -> dict
Returns expected path for the provided data_root
and database_name
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data_root
|
str
|
Path to the folder where the dataset will be stored. Each database has its own subfolder |
required |
database_name
|
str
|
Name of the expected database. |
required |
Returns:
Name | Type | Description |
---|---|---|
str |
dict
|
Dictionary of paths. |
Source code in cesnet_tszoo\datasets\cesnet_database.py
156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 |
|