Skip to content

Benchmark

cesnet_tszoo.benchmarks

Benchmark

Used as wrapper for imported dataset, config, annotations and related_results.

Intended usage:

For time-based:

  1. Call load_benchmark with the desired benchmark identifier. You can use your own saved benchmark or you can use already built-in one. This will download the dataset and annotations (if available) if they have not been previously downloaded.
  2. Retrieve the initialized dataset using get_initialized_dataset. This will provide a dataset that is ready to use.
  3. Use get_train_dataloader or get_train_df to get training data for chosen model.
  4. Validate the model and perform the hyperparameter optimalization on get_val_dataloader or get_val_df.
  5. Evaluate the model on get_test_dataloader or get_test_df.
  6. (Optional) Evaluate the model on get_test_other_dataloader or get_test_other_df.

For series-based:

  1. Call load_benchmark with the desired benchmark. You can use your own saved benchmark or you can use already built-in one. This will download the dataset and annotations (if available) if they have not been previously downloaded.
  2. Retrieve the initialized dataset using get_initialized_dataset. This will provide a dataset that is ready to use.
  3. Use get_train_dataloader or get_train_df to get training data for chosen model.
  4. Validate the model and perform the hyperparameter optimalization on get_val_dataloader or get_val_df.
  5. Evaluate the model on get_test_dataloader or get_test_df.

You can create custom time-based benchmarks with save_benchmark or series-based benchmarks with save_benchmark. They will be saved to "data_root"/tszoo/benchmarks/ directory, where data_root was set when you created instance of dataset.

Source code in cesnet_tszoo\benchmarks.py
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
class Benchmark:
    """
    Used as wrapper for imported `dataset`, `config`, `annotations` and `related_results`.

    **Intended usage:**

    For time-based:

    1. Call [`load_benchmark`][cesnet_tszoo.benchmarks.load_benchmark] with the desired benchmark identifier. You can use your own saved benchmark or you can use already built-in one. This will download the dataset and annotations (if available) if they have not been previously downloaded.
    2. Retrieve the initialized dataset using [`get_initialized_dataset`][cesnet_tszoo.benchmarks.Benchmark.get_initialized_dataset]. This will provide a dataset that is ready to use.
    3. Use [`get_train_dataloader`][cesnet_tszoo.datasets.time_based_cesnet_dataset.TimeBasedCesnetDataset.get_train_dataloader] or [`get_train_df`][cesnet_tszoo.datasets.time_based_cesnet_dataset.TimeBasedCesnetDataset.get_train_df] to get training data for chosen model.
    4. Validate the model and perform the hyperparameter optimalization on [`get_val_dataloader`][cesnet_tszoo.datasets.time_based_cesnet_dataset.TimeBasedCesnetDataset.get_val_dataloader] or [`get_val_df`][cesnet_tszoo.datasets.time_based_cesnet_dataset.TimeBasedCesnetDataset.get_val_df].
    5. Evaluate the model on [`get_test_dataloader`][cesnet_tszoo.datasets.time_based_cesnet_dataset.TimeBasedCesnetDataset.get_test_dataloader] or [`get_test_df`][cesnet_tszoo.datasets.time_based_cesnet_dataset.TimeBasedCesnetDataset.get_test_df]. 
    6. (Optional) Evaluate the model on [`get_test_other_dataloader`][cesnet_tszoo.datasets.time_based_cesnet_dataset.TimeBasedCesnetDataset.get_test_other_dataloader] or [`get_test_other_df`][cesnet_tszoo.datasets.time_based_cesnet_dataset.TimeBasedCesnetDataset.get_test_other_df]. 

    For series-based: 

    1. Call [`load_benchmark`][cesnet_tszoo.benchmarks.load_benchmark] with the desired benchmark. You can use your own saved benchmark or you can use already built-in one. This will download the dataset and annotations (if available) if they have not been previously downloaded.
    2. Retrieve the initialized dataset using [`get_initialized_dataset`][cesnet_tszoo.benchmarks.Benchmark.get_initialized_dataset]. This will provide a dataset that is ready to use.
    3. Use [`get_train_dataloader`][cesnet_tszoo.datasets.series_based_cesnet_dataset.SeriesBasedCesnetDataset.get_train_dataloader] or [`get_train_df`][cesnet_tszoo.datasets.series_based_cesnet_dataset.SeriesBasedCesnetDataset.get_train_df] to get training data for chosen model.
    4. Validate the model and perform the hyperparameter optimalization on [`get_val_dataloader`][cesnet_tszoo.datasets.series_based_cesnet_dataset.SeriesBasedCesnetDataset.get_val_dataloader] or [`get_val_df`][cesnet_tszoo.datasets.series_based_cesnet_dataset.SeriesBasedCesnetDataset.get_val_df].
    5. Evaluate the model on [`get_test_dataloader`][cesnet_tszoo.datasets.series_based_cesnet_dataset.SeriesBasedCesnetDataset.get_test_dataloader] or [`get_test_df`][cesnet_tszoo.datasets.series_based_cesnet_dataset.SeriesBasedCesnetDataset.get_test_df].     

    You can create custom time-based benchmarks with [`save_benchmark`][cesnet_tszoo.datasets.time_based_cesnet_dataset.TimeBasedCesnetDataset.save_benchmark] or series-based benchmarks with [`save_benchmark`][cesnet_tszoo.datasets.series_based_cesnet_dataset.SeriesBasedCesnetDataset.save_benchmark].
    They will be saved to `"data_root"/tszoo/benchmarks/` directory, where `data_root` was set when you created instance of dataset.
    """

    def __init__(self, config: DatasetConfig, dataset: CesnetDataset, description: str = None):
        self.config = config
        self.dataset = dataset
        self.description = description
        self.related_results = None
        self.logger = logging.getLogger("benchmark")

    def get_config(self) -> SeriesBasedConfig | TimeBasedConfig:
        """Return config made for this benchmark. """

        return self.config

    def get_initialized_dataset(self, display_config_details: bool = True, check_errors: bool = False, workers: Literal["config"] | int = "config") -> TimeBasedCesnetDataset | SeriesBasedCesnetDataset:
        """
        Return dataset with intialized sets, scalers, fillers etc..

        This method uses following config attributes:

        | Dataset config                    | Description                                                                                    |
        | --------------------------------- | ---------------------------------------------------------------------------------------------- |
        | `init_workers`                    | Specifies the number of workers to use for initialization. Applied when `workers` = "config". |
        | `partial_fit_initialized_scalers` | Determines whether initialized scalers should be partially fitted on the training data.        |
        | `nan_threshold`                   | Filters out time series with missing values exceeding the specified threshold.                 |

        Parameters:
            display_config_details: Flag indicating whether to display the configuration values after initialization. `Default: True`   
            check_errors: Whether to validate if dataset is not corrupted. `Default: False`
            workers: The number of workers to use during initialization. `Default: "config"`        

        Returns:
            Return initialized dataset.
        """

        if check_errors:
            self.dataset.check_errors()

        self.dataset.set_dataset_config_and_initialize(self.config, display_config_details, workers)

        return self.dataset

    def get_dataset(self, check_errors: bool = False) -> TimeBasedCesnetDataset | SeriesBasedCesnetDataset:
        """Return dataset without initializing it.

        Parameters:
            check_errors: Whether to validate if dataset is not corrupted. `Default: False`

        Returns:
            Return dataset used for this benchmark.
        """

        if check_errors:
            self.dataset.check_errors()

        return self.dataset

    def get_annotations(self, on: AnnotationType | Literal["id_time", "ts_id", "both"]) -> pd.DataFrame:
        """ 
        Return the annotations as a Pandas [`DataFrame`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html).

        Parameters:
            on: Specifies which annotations to return. If set to `"both"`, annotations will be applied as if `id_time` and `ts_id` were both set.         

        Returns:
            A Pandas DataFrame containing the selected annotations.      
        """

        return self.dataset.get_annotations(on)

    def get_related_results(self) -> pd.DataFrame | None:
        """
        Return the related results as a Pandas [`DataFrame`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html), if they exist. 

        Returns:
            A Pandas DataFrame containing related results or None if not related results exist. 
        """

        return self.related_results

get_annotations

get_annotations(on: AnnotationType | Literal['id_time', 'ts_id', 'both']) -> pd.DataFrame

Return the annotations as a Pandas DataFrame.

Parameters:

Name Type Description Default
on AnnotationType | Literal['id_time', 'ts_id', 'both']

Specifies which annotations to return. If set to "both", annotations will be applied as if id_time and ts_id were both set.

required

Returns:

Type Description
DataFrame

A Pandas DataFrame containing the selected annotations.

Source code in cesnet_tszoo\benchmarks.py
103
104
105
106
107
108
109
110
111
112
113
114
def get_annotations(self, on: AnnotationType | Literal["id_time", "ts_id", "both"]) -> pd.DataFrame:
    """ 
    Return the annotations as a Pandas [`DataFrame`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html).

    Parameters:
        on: Specifies which annotations to return. If set to `"both"`, annotations will be applied as if `id_time` and `ts_id` were both set.         

    Returns:
        A Pandas DataFrame containing the selected annotations.      
    """

    return self.dataset.get_annotations(on)

get_config

get_config() -> SeriesBasedConfig | TimeBasedConfig

Return config made for this benchmark.

Source code in cesnet_tszoo\benchmarks.py
55
56
57
58
def get_config(self) -> SeriesBasedConfig | TimeBasedConfig:
    """Return config made for this benchmark. """

    return self.config

get_dataset

get_dataset(check_errors: bool = False) -> TimeBasedCesnetDataset | SeriesBasedCesnetDataset

Return dataset without initializing it.

Parameters:

Name Type Description Default
check_errors bool

Whether to validate if dataset is not corrupted. Default: False

False

Returns:

Type Description
TimeBasedCesnetDataset | SeriesBasedCesnetDataset

Return dataset used for this benchmark.

Source code in cesnet_tszoo\benchmarks.py
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
def get_dataset(self, check_errors: bool = False) -> TimeBasedCesnetDataset | SeriesBasedCesnetDataset:
    """Return dataset without initializing it.

    Parameters:
        check_errors: Whether to validate if dataset is not corrupted. `Default: False`

    Returns:
        Return dataset used for this benchmark.
    """

    if check_errors:
        self.dataset.check_errors()

    return self.dataset

get_initialized_dataset

get_initialized_dataset(display_config_details: bool = True, check_errors: bool = False, workers: Literal['config'] | int = 'config') -> TimeBasedCesnetDataset | SeriesBasedCesnetDataset

Return dataset with intialized sets, scalers, fillers etc..

This method uses following config attributes:

Dataset config Description
init_workers Specifies the number of workers to use for initialization. Applied when workers = "config".
partial_fit_initialized_scalers Determines whether initialized scalers should be partially fitted on the training data.
nan_threshold Filters out time series with missing values exceeding the specified threshold.

Parameters:

Name Type Description Default
display_config_details bool

Flag indicating whether to display the configuration values after initialization. Default: True

True
check_errors bool

Whether to validate if dataset is not corrupted. Default: False

False
workers Literal['config'] | int

The number of workers to use during initialization. Default: "config"

'config'

Returns:

Type Description
TimeBasedCesnetDataset | SeriesBasedCesnetDataset

Return initialized dataset.

Source code in cesnet_tszoo\benchmarks.py
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
def get_initialized_dataset(self, display_config_details: bool = True, check_errors: bool = False, workers: Literal["config"] | int = "config") -> TimeBasedCesnetDataset | SeriesBasedCesnetDataset:
    """
    Return dataset with intialized sets, scalers, fillers etc..

    This method uses following config attributes:

    | Dataset config                    | Description                                                                                    |
    | --------------------------------- | ---------------------------------------------------------------------------------------------- |
    | `init_workers`                    | Specifies the number of workers to use for initialization. Applied when `workers` = "config". |
    | `partial_fit_initialized_scalers` | Determines whether initialized scalers should be partially fitted on the training data.        |
    | `nan_threshold`                   | Filters out time series with missing values exceeding the specified threshold.                 |

    Parameters:
        display_config_details: Flag indicating whether to display the configuration values after initialization. `Default: True`   
        check_errors: Whether to validate if dataset is not corrupted. `Default: False`
        workers: The number of workers to use during initialization. `Default: "config"`        

    Returns:
        Return initialized dataset.
    """

    if check_errors:
        self.dataset.check_errors()

    self.dataset.set_dataset_config_and_initialize(self.config, display_config_details, workers)

    return self.dataset
get_related_results() -> pd.DataFrame | None

Return the related results as a Pandas DataFrame, if they exist.

Returns:

Type Description
DataFrame | None

A Pandas DataFrame containing related results or None if not related results exist.

Source code in cesnet_tszoo\benchmarks.py
116
117
118
119
120
121
122
123
124
def get_related_results(self) -> pd.DataFrame | None:
    """
    Return the related results as a Pandas [`DataFrame`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html), if they exist. 

    Returns:
        A Pandas DataFrame containing related results or None if not related results exist. 
    """

    return self.related_results

load_benchmark

load_benchmark(identifier: str, data_root: str) -> Benchmark

Load a benchmark using the identifier.

First, it attempts to load the built-in benchmark, if no built-in benchmark with such an identifier exists, it attempts to load a custom benchmark from the "data_root"/tszoo/benchmarks/ directory.

Parameters:

Name Type Description Default
identifier str

The name of the benchmark YAML file.

required
data_root str

Path to the folder where the dataset will be stored. Each database has its own subfolder "data_root"/tszoo/databases/database_name/.

required

Returns:

Type Description
Benchmark

Return benchmark with config, annotations, dataset and related_results.

Source code in cesnet_tszoo\benchmarks.py
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
def load_benchmark(identifier: str, data_root: str) -> Benchmark:
    """
    Load a benchmark using the identifier.

    First, it attempts to load the built-in benchmark, if no built-in benchmark with such an identifier exists, it attempts to load a custom benchmark from the `"data_root"/tszoo/benchmarks/` directory.

    Parameters:
        identifier: The name of the benchmark YAML file.
        data_root: Path to the folder where the dataset will be stored. Each database has its own subfolder `"data_root"/tszoo/databases/database_name/`.

    Returns:
        Return benchmark with `config`, `annotations`, `dataset` and `related_results`.
    """

    logger = logging.getLogger("benchmark")

    data_root = os.path.normpath(os.path.expanduser(data_root))

    # For anything else
    if isinstance(identifier, str):
        _, is_built_in = get_benchmark_path_and_whether_it_is_built_in(identifier, data_root, logger)

        if is_built_in:
            logger.info("Built-in benchmark found: %s. Loading it.", identifier)
            return _get_built_in_benchmark(identifier, data_root)
        else:
            logger.info("Custom benchmark found: %s. Loading it.", identifier)
            return _get_custom_benchmark(identifier, data_root)

    else:
        logger.error("Invalid identifier.")
        raise ValueError("Invalid identifier.")