Skip to content

CesnetDatabase

cesnet_tszoo.datasets.cesnet_database.CesnetDatabase

Bases: ABC

Base class for cesnet databases. This class should not be used directly. Use it as base for adding new databases.

Derived databases are used by calling class method get_dataset which will create a new dataset instance of SeriesBasedCesnetDataset or TimeBasedCesnetDataset. Check them for more info about how to use them.

Intended usage:

When using TimeBasedCesnetDataset (dataset_type = DatasetType.TIME_BASED):

  1. Create an instance of the dataset with the desired data root by calling get_dataset. This will download the dataset if it has not been previously downloaded and return instance of dataset.
  2. Create an instance of TimeBasedConfig and set it using set_dataset_config_and_initialize. This initializes the dataset, including data splitting (train/validation/test), fitting transformers (if needed), selecting features, and more. This is cached for later use.
  3. Use get_train_dataloader/get_train_df/get_train_numpy to get training data for chosen model.
  4. Validate the model and perform the hyperparameter optimalization on get_val_dataloader/get_val_df/get_val_numpy.
  5. Evaluate the model on get_test_dataloader/get_test_df/get_test_numpy.

When using SeriesBasedCesnetDataset (dataset_type = DatasetType.SERIES_BASED):

  1. Create an instance of the dataset with the desired data root by calling get_dataset. This will download the dataset if it has not been previously downloaded and return instance of dataset.
  2. Create an instance of SeriesBasedConfig and set it using set_dataset_config_and_initialize. This initializes the dataset, including data splitting (train/validation/test), fitting transformers (if needed), selecting features, and more. This is cached for later use.
  3. Use get_train_dataloader/get_train_df/get_train_numpy to get training data for chosen model.
  4. Validate the model and perform the hyperparameter optimalization on get_val_dataloader/get_val_df/get_val_numpy.
  5. Evaluate the model on get_test_dataloader/get_test_df/get_test_numpy.

When using DisjointTimeBasedCesnetDataset (dataset_type = DatasetType.DISJOINT_TIME_BASED):

  1. Create an instance of the dataset with the desired data root by calling get_dataset. This will download the dataset if it has not been previously downloaded and return instance of dataset.
  2. Create an instance of DisjointTimeBasedConfig and set it using set_dataset_config_and_initialize. This initializes the dataset, including data splitting (train/validation/test), fitting transformers (if needed), selecting features, and more. This is cached for later use.
  3. Use get_train_dataloader/get_train_df/get_train_numpy to get training data for chosen model.
  4. Validate the model and perform the hyperparameter optimalization on get_val_dataloader/get_val_df/get_val_numpy.
  5. Evaluate the model on get_test_dataloader/get_test_df/get_test_numpy.

Used class attributes:

Attributes:

Name Type Description
name str

Name of the database.

bucket_url str

URL of the bucket where the dataset is stored.

tszoo_root str

Path to folder where all databases are saved. Set after get_dataset was called at least once.

database_root str

Path to the folder where datasets belonging to the database are saved. Set after get_dataset was called at least once.

configs_root str

Path to the folder where configurations are saved. Set after get_dataset was called at least once.

benchmarks_root str

Path to the folder where benchmarks are saved. Set after get_dataset was called at least once.

annotations_root str

Path to the folder where annotations are saved. Set after get_dataset was called at least once.

id_names dict

Names for time series IDs for each source_type.

default_values dict

Default values for each available feature.

source_types list[SourceType]

Available source types for the database.

aggregations list[AgreggationType]

Available aggregations for the database.

additional_data dict[str, tuple]

Available small datasets for each dataset.

Source code in cesnet_tszoo\datasets\cesnet_database.py
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
class CesnetDatabase(ABC):
    """
    Base class for cesnet databases. This class should **not** be used directly. Use it as base for adding new databases.

    Derived databases are used by calling class method [`get_dataset`][cesnet_tszoo.datasets.cesnet_database.CesnetDatabase.get_dataset] which will create a new dataset instance of [`SeriesBasedCesnetDataset`][cesnet_tszoo.datasets.series_based_cesnet_dataset.SeriesBasedCesnetDataset] 
    or [`TimeBasedCesnetDataset`][cesnet_tszoo.datasets.time_based_cesnet_dataset.TimeBasedCesnetDataset]. Check them for more info about how to use them.

    **Intended usage:**

    When using [`TimeBasedCesnetDataset`][cesnet_tszoo.datasets.time_based_cesnet_dataset.TimeBasedCesnetDataset] (`dataset_type` = `DatasetType.TIME_BASED`):

    1. Create an instance of the dataset with the desired data root by calling [`get_dataset`][cesnet_tszoo.datasets.cesnet_database.CesnetDatabase.get_dataset]. This will download the dataset if it has not been previously downloaded and return instance of dataset.
    2. Create an instance of [`TimeBasedConfig`][cesnet_tszoo.configs.time_based_config.TimeBasedConfig] and set it using [`set_dataset_config_and_initialize`][cesnet_tszoo.datasets.time_based_cesnet_dataset.TimeBasedCesnetDataset.set_dataset_config_and_initialize]. 
       This initializes the dataset, including data splitting (train/validation/test), fitting transformers (if needed), selecting features, and more. This is cached for later use.
    3. Use [`get_train_dataloader`][cesnet_tszoo.datasets.time_based_cesnet_dataset.TimeBasedCesnetDataset.get_train_dataloader]/[`get_train_df`][cesnet_tszoo.datasets.time_based_cesnet_dataset.TimeBasedCesnetDataset.get_train_df]/[`get_train_numpy`][cesnet_tszoo.datasets.time_based_cesnet_dataset.TimeBasedCesnetDataset.get_train_numpy] to get training data for chosen model.
    4. Validate the model and perform the hyperparameter optimalization on [`get_val_dataloader`][cesnet_tszoo.datasets.time_based_cesnet_dataset.TimeBasedCesnetDataset.get_val_dataloader]/[`get_val_df`][cesnet_tszoo.datasets.time_based_cesnet_dataset.TimeBasedCesnetDataset.get_val_df]/[`get_val_numpy`][cesnet_tszoo.datasets.time_based_cesnet_dataset.TimeBasedCesnetDataset.get_val_numpy].
    5. Evaluate the model on [`get_test_dataloader`][cesnet_tszoo.datasets.time_based_cesnet_dataset.TimeBasedCesnetDataset.get_test_dataloader]/[`get_test_df`][cesnet_tszoo.datasets.time_based_cesnet_dataset.TimeBasedCesnetDataset.get_test_df]/[`get_test_numpy`][cesnet_tszoo.datasets.time_based_cesnet_dataset.TimeBasedCesnetDataset.get_test_numpy].     

    When using [`SeriesBasedCesnetDataset`][cesnet_tszoo.datasets.series_based_cesnet_dataset.SeriesBasedCesnetDataset] (`dataset_type` = `DatasetType.SERIES_BASED`):

    1. Create an instance of the dataset with the desired data root by calling [`get_dataset`][cesnet_tszoo.datasets.cesnet_database.CesnetDatabase.get_dataset]. This will download the dataset if it has not been previously downloaded and return instance of dataset.
    2. Create an instance of [`SeriesBasedConfig`][cesnet_tszoo.configs.series_based_config.SeriesBasedConfig] and set it using [`set_dataset_config_and_initialize`][cesnet_tszoo.datasets.series_based_cesnet_dataset.SeriesBasedCesnetDataset.set_dataset_config_and_initialize]. 
       This initializes the dataset, including data splitting (train/validation/test), fitting transformers (if needed), selecting features, and more. This is cached for later use.
    3. Use [`get_train_dataloader`][cesnet_tszoo.datasets.series_based_cesnet_dataset.SeriesBasedCesnetDataset.get_train_dataloader]/[`get_train_df`][cesnet_tszoo.datasets.series_based_cesnet_dataset.SeriesBasedCesnetDataset.get_train_df]/[`get_train_numpy`][cesnet_tszoo.datasets.series_based_cesnet_dataset.SeriesBasedCesnetDataset.get_train_numpy] to get training data for chosen model.
    4. Validate the model and perform the hyperparameter optimalization on [`get_val_dataloader`][cesnet_tszoo.datasets.series_based_cesnet_dataset.SeriesBasedCesnetDataset.get_val_dataloader]/[`get_val_df`][cesnet_tszoo.datasets.series_based_cesnet_dataset.SeriesBasedCesnetDataset.get_val_df]/[`get_val_numpy`][cesnet_tszoo.datasets.series_based_cesnet_dataset.SeriesBasedCesnetDataset.get_val_numpy].
    5. Evaluate the model on [`get_test_dataloader`][cesnet_tszoo.datasets.series_based_cesnet_dataset.SeriesBasedCesnetDataset.get_test_dataloader]/[`get_test_df`][cesnet_tszoo.datasets.series_based_cesnet_dataset.SeriesBasedCesnetDataset.get_test_df]/[`get_test_numpy`][cesnet_tszoo.datasets.series_based_cesnet_dataset.SeriesBasedCesnetDataset.get_test_numpy].   

    When using [`DisjointTimeBasedCesnetDataset`][cesnet_tszoo.datasets.disjoint_time_based_cesnet_dataset.DisjointTimeBasedCesnetDataset] (`dataset_type` = `DatasetType.DISJOINT_TIME_BASED`):

    1. Create an instance of the dataset with the desired data root by calling [`get_dataset`][cesnet_tszoo.datasets.cesnet_database.CesnetDatabase.get_dataset]. This will download the dataset if it has not been previously downloaded and return instance of dataset.
    2. Create an instance of [`DisjointTimeBasedConfig`][cesnet_tszoo.configs.disjoint_time_based_config.DisjointTimeBasedConfig] and set it using [`set_dataset_config_and_initialize`][cesnet_tszoo.datasets.disjoint_time_based_cesnet_dataset.DisjointTimeBasedCesnetDataset.set_dataset_config_and_initialize]. 
       This initializes the dataset, including data splitting (train/validation/test), fitting transformers (if needed), selecting features, and more. This is cached for later use.
    3. Use [`get_train_dataloader`][cesnet_tszoo.datasets.disjoint_time_based_cesnet_dataset.DisjointTimeBasedCesnetDataset.get_train_dataloader]/[`get_train_df`][cesnet_tszoo.datasets.disjoint_time_based_cesnet_dataset.DisjointTimeBasedCesnetDataset.get_train_df]/[`get_train_numpy`][cesnet_tszoo.datasets.disjoint_time_based_cesnet_dataset.DisjointTimeBasedCesnetDataset.get_train_numpy] to get training data for chosen model.
    4. Validate the model and perform the hyperparameter optimalization on [`get_val_dataloader`][cesnet_tszoo.datasets.disjoint_time_based_cesnet_dataset.DisjointTimeBasedCesnetDataset.get_val_dataloader]/[`get_val_df`][cesnet_tszoo.datasets.disjoint_time_based_cesnet_dataset.DisjointTimeBasedCesnetDataset.get_val_df]/[`get_val_numpy`][cesnet_tszoo.datasets.disjoint_time_based_cesnet_dataset.DisjointTimeBasedCesnetDataset.get_val_numpy].
    5. Evaluate the model on [`get_test_dataloader`][cesnet_tszoo.datasets.disjoint_time_based_cesnet_dataset.DisjointTimeBasedCesnetDataset.get_test_dataloader]/[`get_test_df`][cesnet_tszoo.datasets.disjoint_time_based_cesnet_dataset.DisjointTimeBasedCesnetDataset.get_test_df]/[`get_test_numpy`][cesnet_tszoo.datasets.disjoint_time_based_cesnet_dataset.DisjointTimeBasedCesnetDataset.get_test_numpy].   

    Used class attributes:

    Attributes:
        name: Name of the database.
        bucket_url: URL of the bucket where the dataset is stored.
        tszoo_root: Path to folder where all databases are saved. Set after [`get_dataset`][cesnet_tszoo.datasets.cesnet_database.CesnetDatabase.get_dataset] was called at least once.
        database_root: Path to the folder where datasets belonging to the database are saved. Set after [`get_dataset`][cesnet_tszoo.datasets.cesnet_database.CesnetDatabase.get_dataset] was called at least once.
        configs_root: Path to the folder where configurations are saved. Set after [`get_dataset`][cesnet_tszoo.datasets.cesnet_database.CesnetDatabase.get_dataset] was called at least once.
        benchmarks_root: Path to the folder where benchmarks are saved. Set after [`get_dataset`][cesnet_tszoo.datasets.cesnet_database.CesnetDatabase.get_dataset] was called at least once.
        annotations_root: Path to the folder where annotations are saved. Set after [`get_dataset`][cesnet_tszoo.datasets.cesnet_database.CesnetDatabase.get_dataset] was called at least once.
        id_names: Names for time series IDs for each `source_type`.
        default_values: Default values for each available feature.
        source_types: Available source types for the database.
        aggregations: Available aggregations for the database.   
        additional_data: Available small datasets for each dataset. 
    """

    name: str
    bucket_url: str

    tszoo_root: str
    database_root: str
    configs_root: str
    benchmarks_root: str
    annotations_root: str

    id_names: dict = None
    default_values: dict = None
    source_types: list[SourceType] = []
    aggregations: list[AgreggationType] = []
    additional_data: dict[str, tuple] = {}

    def __init__(self):
        raise ValueError("To create dataset instance use class method 'get_dataset' instead.")

    @classmethod
    def get_dataset(cls, data_root: str, source_type: SourceType | str, aggregation: AgreggationType | str, dataset_type: DatasetType | str, check_errors: bool = False, display_details: bool = False) -> TimeBasedCesnetDataset | SeriesBasedCesnetDataset:
        """
        Create new dataset instance.

        Parameters:
            data_root: Path to the folder where the dataset will be stored. Each database has its own subfolder `data_root/tszoo/databases/database_name/`.
            source_type: The source type of the desired dataset.
            aggregation: The aggregation type for the selected source type.
            dataset_type: Type of a dataset you want to create. Can be [`TimeBasedCesnetDataset`][cesnet_tszoo.datasets.time_based_cesnet_dataset.TimeBasedCesnetDataset], [`SeriesBasedCesnetDataset`][cesnet_tszoo.datasets.series_based_cesnet_dataset.SeriesBasedCesnetDataset] or [`DisjointTimeBasedCesnetDataset`][cesnet_tszoo.datasets.disjoint_time_based_cesnet_dataset.DisjointTimeBasedCesnetDataset].
            check_errors: Whether to validate if the dataset is corrupted. `Default: False`
            display_details: Whether to display details about the available data in chosen dataset. `Default: False`

        Returns:
            [`TimeBasedCesnetDataset`][cesnet_tszoo.datasets.time_based_cesnet_dataset.TimeBasedCesnetDataset], [`SeriesBasedCesnetDataset`][cesnet_tszoo.datasets.series_based_cesnet_dataset.SeriesBasedCesnetDataset] or [`DisjointTimeBasedCesnetDataset`][cesnet_tszoo.datasets.disjoint_time_based_cesnet_dataset.DisjointTimeBasedCesnetDataset].
        """

        logger = logging.getLogger("wrapper_dataset")

        source_type = SourceType(source_type)
        aggregation = AgreggationType(aggregation)
        dataset_type = DatasetType(dataset_type)

        if source_type not in cls.source_types:
            raise ValueError(f"Unsupported source type: {source_type}")

        if aggregation not in cls.aggregations:
            raise ValueError(f"Unsupported aggregation type: {aggregation}")

        # Dataset paths setup
        cls.tszoo_root = os.path.normpath(os.path.expanduser(os.path.join(data_root, "tszoo")))
        cls.database_root = os.path.join(cls.tszoo_root, "databases", cls.name)
        cls.configs_root = os.path.join(cls.tszoo_root, "configs")
        cls.benchmarks_root = os.path.join(cls.tszoo_root, "benchmarks")
        cls.annotations_root = os.path.join(cls.tszoo_root, "annotations")
        dataset_name = f"{cls.name}-{source_type.value}-{AgreggationType._to_str_without_number(aggregation)}"
        dataset_path = os.path.join(cls.database_root, f"{dataset_name}.h5")

        # Ensure necessary directories exist
        for directory in [cls.database_root, cls.configs_root, cls.annotations_root, cls.benchmarks_root]:
            if not os.path.exists(directory):
                logger.info("Creating directory: %s", directory)
                os.makedirs(directory)

        if not cls._is_downloaded(dataset_path):
            cls._download(dataset_name, dataset_path)

        if dataset_type == DatasetType.SERIES_BASED:
            dataset = SeriesBasedCesnetDataset(cls.name, dataset_path, cls.configs_root, cls.benchmarks_root, cls.annotations_root, source_type, aggregation, cls.id_names[source_type], cls.default_values, cls.additional_data)
        elif dataset_type == DatasetType.TIME_BASED:
            dataset = TimeBasedCesnetDataset(cls.name, dataset_path, cls.configs_root, cls.benchmarks_root, cls.annotations_root, source_type, aggregation, cls.id_names[source_type], cls.default_values, cls.additional_data)
        elif dataset_type == DatasetType.DISJOINT_TIME_BASED:
            dataset = DisjointTimeBasedCesnetDataset(cls.name, dataset_path, cls.configs_root, cls.benchmarks_root, cls.annotations_root, source_type, aggregation, cls.id_names[source_type], cls.default_values, cls.additional_data)
        else:
            raise NotImplementedError()

        if check_errors:
            dataset.check_errors()

        if display_details:
            dataset.display_dataset_details()

        if dataset_type == DatasetType.SERIES_BASED:
            logger.info("Dataset is series-based. Use cesnet_tszoo.configs.SeriesBasedConfig")
        elif dataset_type == DatasetType.TIME_BASED:
            logger.info("Dataset is time-based. Use cesnet_tszoo.configs.TimeBasedConfig")
        elif dataset_type == DatasetType.DISJOINT_TIME_BASED:
            logger.info("Dataset is disjoint_time_based. Use cesnet_tszoo.configs.DisjointTimeBasedConfig")
        else:
            raise NotImplementedError()

        return dataset

    @classmethod
    def get_expected_paths(cls, data_root: str, database_name: str) -> dict:
        """Returns expected path for the provided `data_root` and `database_name`

        Args:
            data_root: Path to the folder where the dataset will be stored. Each database has its own subfolder `data_root/tszoo/databases/database_name/`.
            database_name: Name of the expected database.

        Returns:
            str: Dictionary of paths.
        """

        paths = {}

        paths["tszoo_root"] = os.path.normpath(os.path.expanduser(os.path.join(data_root, "tszoo")))
        paths["database_root"] = os.path.join(paths["tszoo_root"], "databases", database_name)
        paths["configs_root"] = os.path.join(paths["tszoo_root"], "configs")
        paths["benchmarks_root"] = os.path.join(paths["tszoo_root"], "benchmarks")
        paths["annotations_root"] = os.path.join(paths["tszoo_root"], "annotations")

        return paths

    @classmethod
    def _is_downloaded(cls, dataset_path: str) -> bool:
        """Check whether the dataset at path has already been downloaded. """

        return os.path.exists(dataset_path)

    @classmethod
    def _download(cls, dataset_name: str, dataset_path: str) -> None:
        """Download the dataset file. """

        logger = logging.getLogger("wrapper_dataset")

        logger.info("Downloading %s dataset.", dataset_name)
        database_url = f"{cls.bucket_url}&file={dataset_name}.h5"
        resumable_download(url=database_url, file_path=dataset_path, silent=False)

get_dataset classmethod

get_dataset(data_root: str, source_type: SourceType | str, aggregation: AgreggationType | str, dataset_type: DatasetType | str, check_errors: bool = False, display_details: bool = False) -> TimeBasedCesnetDataset | SeriesBasedCesnetDataset

Create new dataset instance.

Parameters:

Name Type Description Default
data_root str

Path to the folder where the dataset will be stored. Each database has its own subfolder data_root/tszoo/databases/database_name/.

required
source_type SourceType | str

The source type of the desired dataset.

required
aggregation AgreggationType | str

The aggregation type for the selected source type.

required
dataset_type DatasetType | str

Type of a dataset you want to create. Can be TimeBasedCesnetDataset, SeriesBasedCesnetDataset or DisjointTimeBasedCesnetDataset.

required
check_errors bool

Whether to validate if the dataset is corrupted. Default: False

False
display_details bool

Whether to display details about the available data in chosen dataset. Default: False

False

Returns:

Type Description
TimeBasedCesnetDataset | SeriesBasedCesnetDataset
Source code in cesnet_tszoo\datasets\cesnet_database.py
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
@classmethod
def get_dataset(cls, data_root: str, source_type: SourceType | str, aggregation: AgreggationType | str, dataset_type: DatasetType | str, check_errors: bool = False, display_details: bool = False) -> TimeBasedCesnetDataset | SeriesBasedCesnetDataset:
    """
    Create new dataset instance.

    Parameters:
        data_root: Path to the folder where the dataset will be stored. Each database has its own subfolder `data_root/tszoo/databases/database_name/`.
        source_type: The source type of the desired dataset.
        aggregation: The aggregation type for the selected source type.
        dataset_type: Type of a dataset you want to create. Can be [`TimeBasedCesnetDataset`][cesnet_tszoo.datasets.time_based_cesnet_dataset.TimeBasedCesnetDataset], [`SeriesBasedCesnetDataset`][cesnet_tszoo.datasets.series_based_cesnet_dataset.SeriesBasedCesnetDataset] or [`DisjointTimeBasedCesnetDataset`][cesnet_tszoo.datasets.disjoint_time_based_cesnet_dataset.DisjointTimeBasedCesnetDataset].
        check_errors: Whether to validate if the dataset is corrupted. `Default: False`
        display_details: Whether to display details about the available data in chosen dataset. `Default: False`

    Returns:
        [`TimeBasedCesnetDataset`][cesnet_tszoo.datasets.time_based_cesnet_dataset.TimeBasedCesnetDataset], [`SeriesBasedCesnetDataset`][cesnet_tszoo.datasets.series_based_cesnet_dataset.SeriesBasedCesnetDataset] or [`DisjointTimeBasedCesnetDataset`][cesnet_tszoo.datasets.disjoint_time_based_cesnet_dataset.DisjointTimeBasedCesnetDataset].
    """

    logger = logging.getLogger("wrapper_dataset")

    source_type = SourceType(source_type)
    aggregation = AgreggationType(aggregation)
    dataset_type = DatasetType(dataset_type)

    if source_type not in cls.source_types:
        raise ValueError(f"Unsupported source type: {source_type}")

    if aggregation not in cls.aggregations:
        raise ValueError(f"Unsupported aggregation type: {aggregation}")

    # Dataset paths setup
    cls.tszoo_root = os.path.normpath(os.path.expanduser(os.path.join(data_root, "tszoo")))
    cls.database_root = os.path.join(cls.tszoo_root, "databases", cls.name)
    cls.configs_root = os.path.join(cls.tszoo_root, "configs")
    cls.benchmarks_root = os.path.join(cls.tszoo_root, "benchmarks")
    cls.annotations_root = os.path.join(cls.tszoo_root, "annotations")
    dataset_name = f"{cls.name}-{source_type.value}-{AgreggationType._to_str_without_number(aggregation)}"
    dataset_path = os.path.join(cls.database_root, f"{dataset_name}.h5")

    # Ensure necessary directories exist
    for directory in [cls.database_root, cls.configs_root, cls.annotations_root, cls.benchmarks_root]:
        if not os.path.exists(directory):
            logger.info("Creating directory: %s", directory)
            os.makedirs(directory)

    if not cls._is_downloaded(dataset_path):
        cls._download(dataset_name, dataset_path)

    if dataset_type == DatasetType.SERIES_BASED:
        dataset = SeriesBasedCesnetDataset(cls.name, dataset_path, cls.configs_root, cls.benchmarks_root, cls.annotations_root, source_type, aggregation, cls.id_names[source_type], cls.default_values, cls.additional_data)
    elif dataset_type == DatasetType.TIME_BASED:
        dataset = TimeBasedCesnetDataset(cls.name, dataset_path, cls.configs_root, cls.benchmarks_root, cls.annotations_root, source_type, aggregation, cls.id_names[source_type], cls.default_values, cls.additional_data)
    elif dataset_type == DatasetType.DISJOINT_TIME_BASED:
        dataset = DisjointTimeBasedCesnetDataset(cls.name, dataset_path, cls.configs_root, cls.benchmarks_root, cls.annotations_root, source_type, aggregation, cls.id_names[source_type], cls.default_values, cls.additional_data)
    else:
        raise NotImplementedError()

    if check_errors:
        dataset.check_errors()

    if display_details:
        dataset.display_dataset_details()

    if dataset_type == DatasetType.SERIES_BASED:
        logger.info("Dataset is series-based. Use cesnet_tszoo.configs.SeriesBasedConfig")
    elif dataset_type == DatasetType.TIME_BASED:
        logger.info("Dataset is time-based. Use cesnet_tszoo.configs.TimeBasedConfig")
    elif dataset_type == DatasetType.DISJOINT_TIME_BASED:
        logger.info("Dataset is disjoint_time_based. Use cesnet_tszoo.configs.DisjointTimeBasedConfig")
    else:
        raise NotImplementedError()

    return dataset

get_expected_paths classmethod

get_expected_paths(data_root: str, database_name: str) -> dict

Returns expected path for the provided data_root and database_name

Parameters:

Name Type Description Default
data_root str

Path to the folder where the dataset will be stored. Each database has its own subfolder data_root/tszoo/databases/database_name/.

required
database_name str

Name of the expected database.

required

Returns:

Name Type Description
str dict

Dictionary of paths.

Source code in cesnet_tszoo\datasets\cesnet_database.py
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
@classmethod
def get_expected_paths(cls, data_root: str, database_name: str) -> dict:
    """Returns expected path for the provided `data_root` and `database_name`

    Args:
        data_root: Path to the folder where the dataset will be stored. Each database has its own subfolder `data_root/tszoo/databases/database_name/`.
        database_name: Name of the expected database.

    Returns:
        str: Dictionary of paths.
    """

    paths = {}

    paths["tszoo_root"] = os.path.normpath(os.path.expanduser(os.path.join(data_root, "tszoo")))
    paths["database_root"] = os.path.join(paths["tszoo_root"], "databases", database_name)
    paths["configs_root"] = os.path.join(paths["tszoo_root"], "configs")
    paths["benchmarks_root"] = os.path.join(paths["tszoo_root"], "benchmarks")
    paths["annotations_root"] = os.path.join(paths["tszoo_root"], "annotations")

    return paths