CesnetDatabase
cesnet_tszoo.datasets.cesnet_database.CesnetDatabase
Bases: ABC
Base class for cesnet databases. This class should not be used directly. Use it as base for adding new databases.
Derived databases are used by calling class method get_dataset
which will create a new dataset instance of SeriesBasedCesnetDataset
or TimeBasedCesnetDataset
. Check them for more info about how to use them.
Intended usage:
When using TimeBasedCesnetDataset
(is_series_based
= False
):
- Create an instance of the dataset with the desired data root by calling
get_dataset
. This will download the dataset if it has not been previously downloaded and return instance of dataset. - Create an instance of
TimeBasedConfig
and set it usingset_dataset_config_and_initialize
. This initializes the dataset, including data splitting (train/validation/test/test_other), fitting scalers (if needed), selecting features, and more. This is cached for later use. - Use
get_train_dataloader
/get_train_df
/get_train_numpy
to get training data for chosen model. - Validate the model and perform the hyperparameter optimalization on
get_val_dataloader
/get_val_df
/get_val_numpy
. - Evaluate the model on
get_test_dataloader
/get_test_df
/get_test_numpy
. - (Optional) Evaluate the model on
get_test_other_dataloader
/get_test_other_df
/get_test_other_numpy
.
When using SeriesBasedCesnetDataset
(is_series_based
= True
):
- Create an instance of the dataset with the desired data root by calling
get_dataset
. This will download the dataset if it has not been previously downloaded and return instance of dataset. - Create an instance of
SeriesBasedConfig
and set it usingset_dataset_config_and_initialize
. This initializes the dataset, including data splitting (train/validation/test), fitting scalers (if needed), selecting features, and more. This is cached for later use. - Use
get_train_dataloader
/get_train_df
/get_train_numpy
to get training data for chosen model. - Validate the model and perform the hyperparameter optimalization on
get_val_dataloader
/get_val_df
/get_val_numpy
. - Evaluate the model on
get_test_dataloader
/get_test_df
/get_test_numpy
.
Used class attributes:
Attributes:
Name | Type | Description |
---|---|---|
name |
str
|
Name of the database. |
bucket_url |
str
|
URL of the bucket where the dataset is stored. |
tszoo_root |
str
|
Path to folder where all databases are saved. Set after |
database_root |
str
|
Path to the folder where datasets belonging to the database are saved. Set after |
configs_root |
str
|
Path to the folder where configurations are saved. Set after |
benchmarks_root |
str
|
Path to the folder where benchmarks are saved. Set after |
annotations_root |
str
|
Path to the folder where annotations are saved. Set after |
id_names |
dict
|
Names for time series IDs for each |
default_values |
dict
|
Default values for each available feature. |
source_types |
list[SourceType]
|
Available source types for the database. |
aggregations |
list[AgreggationType]
|
Available aggregations for the database. |
additional_data |
dict[str, tuple]
|
Available small datasets for each dataset. |
Source code in cesnet_tszoo\datasets\cesnet_database.py
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 |
|
get_dataset
classmethod
get_dataset(data_root: str, source_type: SourceType | str, aggregation: AgreggationType | str, is_series_based: bool, check_errors: bool = False, display_details: bool = False) -> TimeBasedCesnetDataset | SeriesBasedCesnetDataset
Create new dataset instance.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data_root
|
str
|
Path to the folder where the dataset will be stored. Each database has its own subfolder |
required |
source_type
|
SourceType | str
|
The source type of the desired dataset. |
required |
aggregation
|
AgreggationType | str
|
The aggregation type for the selected source type. |
required |
is_series_based
|
bool
|
Whether you want to create series-based dataset or time-based dataset. |
required |
check_errors
|
bool
|
Whether to validate if the dataset is corrupted. |
False
|
display_details
|
bool
|
Whether to display details about the available data in chosen dataset. |
False
|
Returns:
Type | Description |
---|---|
TimeBasedCesnetDataset | SeriesBasedCesnetDataset
|
TimeBasedCesnetDataset or SeriesBasedCesnetDataset. |
Source code in cesnet_tszoo\datasets\cesnet_database.py
74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 |
|