Skip to content

Using datasets

This tutorial will look at what you need to use dataset.
Trying to use dataset you do not have downloaded, will automatically download it.

There currently seven supported datasets:

Using dataset from benchmark

You can refer to benchmarks for more detailed usage.

from cesnet_tszoo.benchmarks import load_benchmark                                                                       

# Imports built-in benchmark
benchmark = load_benchmark(identifier="2e92831cb502", data_root="/some_directory/")
dataset = benchmark.get_initialized_dataset(display_config_details="text", check_errors=False, workers="config")

# Imports custom benchmark
benchmark = load_benchmark(identifier="test2", data_root="/some_directory/")
dataset = benchmark.get_initialized_dataset(display_config_details="text", check_errors=False, workers="config")

Creating dataset

You can refer to choosing_data for more detailed data selection via config.

Example of using dataset from CESNET_TimeSeries24

from cesnet_tszoo.configs import TimeBasedConfig # For time-based dataset
from cesnet_tszoo.configs import SeriesBasedConfig # For series-based dataset   
from cesnet_tszoo.configs import DisjointTimeBasedConfig # For disjoint-time-based dataset

from cesnet_tszoo.utils.enums import SourceType, AgreggationType, DatasetType # Used for specifying which dataset to use
from cesnet_tszoo.datasets import CESNET_TimeSeries24

# Time-based
time_based_dataset = CESNET_TimeSeries24.get_dataset(data_root="/some_directory/", source_type=SourceType.INSTITUTIONS, aggregation=AgreggationType.AGG_1_DAY, dataset_type=DatasetType.TIME_BASED)
config = TimeBasedConfig(ts_ids=50)
time_based_dataset.set_dataset_config_and_initialize(config)

# Disjoint-time-based
disjoint_dataset = CESNET_TimeSeries24.get_dataset(data_root="/some_directory/", source_type=SourceType.INSTITUTIONS, aggregation=AgreggationType.AGG_1_DAY, dataset_type=DatasetType.DISJOINT_TIME_BASED)
config = DisjointTimeBasedConfig(train_ts=50, val_ts=None, test_ts=None, train_time_period=range(0, 200))
disjoint_dataset.set_dataset_config_and_initialize(config)

# Series-based
series_based_dataset = CESNET_TimeSeries24.get_dataset(data_root="/some_directory/", source_type=SourceType.INSTITUTIONS, aggregation=AgreggationType.AGG_1_DAY, dataset_type=DatasetType.SERIES_BASED)
config = SeriesBasedDataset(time_period=range(0, 200))
series_based_dataset.set_dataset_config_and_initialize(config)

Similarly you can use other datasets from the Datasets section.