Skip to content

Benchmark

cesnet_tszoo.benchmarks

Benchmark

Used as wrapper for imported dataset, config, annotations and related_results.

Intended usage:

  1. Call load_benchmark with the desired benchmark. You can use your own saved benchmark or you can use already built-in one. This will download the dataset and annotations (if available) if they have not been previously downloaded.
  2. Retrieve the initialized dataset using get_initialized_dataset. This will provide a dataset that is ready to use. Check beforehand what type of dataset is returned.
  3. Use get_train_dataloader/get_train_df/get_train_numpy to get training data for chosen model.
  4. [Optional] Modify used preprocessing steps with update_dataset_config_and_initialize.
  5. Validate the model and perform the hyperparameter optimalization on get_val_dataloader/get_val_df/get_val_numpy.
  6. Evaluate the model on get_test_dataloader/get_test_df/get_test_numpy.

You can create custom benchmarks with save_benchmark. They will be saved to "data_root"/tszoo/benchmarks/ directory, where data_root was passed as parameter to load_benchmark.

Above steps are practically the same for all dataset types, but there can be small differences is method parameters. Check each of them for info about that.

Source code in cesnet_tszoo\benchmarks.py
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
class Benchmark:
    """
    Used as wrapper for imported `dataset`, `config`, `annotations` and `related_results`.

    **Intended usage:**

    1. Call [`load_benchmark`][cesnet_tszoo.benchmarks.load_benchmark] with the desired benchmark. You can use your own saved benchmark or you can use already built-in one. This will download the dataset and annotations (if available) if they have not been previously downloaded.
    2. Retrieve the initialized dataset using [`get_initialized_dataset`](reference_benchmarks.md#cesnet_tszoo.benchmarks.Benchmark.get_initialized_dataset). This will provide a dataset that is ready to use. Check beforehand what type of dataset is returned.
    3. Use [`get_train_dataloader`](reference_cesnet_dataset.md#cesnet_tszoo.datasets.cesnet_dataset.CesnetDataset.get_train_dataloader)/[`get_train_df`](reference_cesnet_dataset.md#cesnet_tszoo.datasets.cesnet_dataset.CesnetDataset.get_train_df)/[`get_train_numpy`](reference_cesnet_dataset.md#cesnet_tszoo.datasets.cesnet_dataset.CesnetDataset.get_train_numpy) to get training data for chosen model.
    4. [Optional] Modify used preprocessing steps with [`update_dataset_config_and_initialize`](reference_cesnet_dataset.md#cesnet_tszoo.datasets.cesnet_dataset.CesnetDataset.update_dataset_config_and_initialize).
    5. Validate the model and perform the hyperparameter optimalization on [`get_val_dataloader`](reference_cesnet_dataset.md#cesnet_tszoo.datasets.cesnet_dataset.CesnetDataset.get_val_dataloader)/[`get_val_df`](reference_cesnet_dataset.md#cesnet_tszoo.datasets.cesnet_dataset.CesnetDataset.get_val_df)/[`get_val_numpy`](reference_cesnet_dataset.md#cesnet_tszoo.datasets.cesnet_dataset.CesnetDataset.get_val_numpy).
    6. Evaluate the model on [`get_test_dataloader`](reference_cesnet_dataset.md#cesnet_tszoo.datasets.cesnet_dataset.CesnetDataset.get_test_dataloader)/[`get_test_df`](reference_cesnet_dataset.md#cesnet_tszoo.datasets.cesnet_dataset.CesnetDataset.get_test_df)/[`get_test_numpy`](reference_cesnet_dataset.md#cesnet_tszoo.datasets.cesnet_dataset.CesnetDataset.get_test_numpy).     

    You can create custom benchmarks with [`save_benchmark`](reference_cesnet_dataset.md#cesnet_tszoo.datasets.cesnet_dataset.CesnetDataset.save_benchmark).
    They will be saved to `"data_root"/tszoo/benchmarks/` directory, where `data_root` was passed as parameter to [`load_benchmark`][cesnet_tszoo.benchmarks.load_benchmark].

    Above steps are practically the same for all dataset types, but there can be small differences is method parameters. Check each of them for info about that.
    """

    def __init__(self, config: DatasetConfig, dataset: CesnetDataset, description: str = None):
        self.config = config
        self.dataset = dataset
        self.description = description
        self.related_results = None
        self.logger = logging.getLogger("benchmark")

    def get_config(self) -> SeriesBasedConfig | TimeBasedConfig | DisjointTimeBasedConfig:
        """Returns config made for this benchmark. """

        return self.config

    def get_initialized_dataset(self, display_config_details: Optional[Literal["text", "diagram"]] = "text", check_errors: bool = False, workers: Literal["config"] | int = "config") -> TimeBasedCesnetDataset | SeriesBasedCesnetDataset | DisjointTimeBasedCesnetDataset:
        """
        Returns dataset with intialized sets, transformers, fillers etc..

        This method uses following config attributes:

        Dataset config | Description
        -------------- | -----------
        `init_workers` | Specifies the number of workers to use for initialization. Applied when `workers` = "config".
        `partial_fit_initialized_transformers` | Determines whether initialized transformers should be partially fitted on the training data.
        `nan_threshold` | Filters out time series with missing values exceeding the specified threshold.

        Parameters:
            display_config_details: Flag indicating whether to display the configuration values after initialization. `Default: True`   
            check_errors: Whether to validate if dataset is not corrupted. `Default: False`
            workers: The number of workers to use during initialization. `Default: "config"`        

        Returns:
            Returns initialized dataset.
        """

        if display_config_details is not None:
            display_config_details = DisplayType(display_config_details)

        if check_errors:
            self.dataset.check_errors()

        self.dataset.set_dataset_config_and_initialize(self.config, display_config_details, workers)

        return self.dataset

    def get_dataset(self, check_errors: bool = False) -> TimeBasedCesnetDataset | SeriesBasedCesnetDataset | DisjointTimeBasedCesnetDataset:
        """Returns dataset without initializing it.

        Parameters:
            check_errors: Whether to validate if dataset is not corrupted. `Default: False`

        Returns:
            Returns dataset used for this benchmark.
        """

        if check_errors:
            self.dataset.check_errors()

        return self.dataset

    def get_annotations(self, on: AnnotationType | Literal["id_time", "ts_id", "both"]) -> pd.DataFrame:
        """ 
        Returns the annotations as a Pandas [`DataFrame`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html).

        Parameters:
            on: Specifies which annotations to return. If set to `"both"`, annotations will be applied as if `id_time` and `ts_id` were both set.         

        Returns:
            A Pandas DataFrame containing the selected annotations.      
        """

        return self.dataset.get_annotations(on)

    def get_related_results(self) -> pd.DataFrame | None:
        """
        Returns the related results as a Pandas [`DataFrame`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html), if they exist. 

        Returns:
            A Pandas DataFrame containing related results or None if not related results exist. 
        """

        return self.related_results

get_annotations

get_annotations(on: AnnotationType | Literal['id_time', 'ts_id', 'both']) -> pd.DataFrame

Returns the annotations as a Pandas DataFrame.

Parameters:

Name Type Description Default
on AnnotationType | Literal['id_time', 'ts_id', 'both']

Specifies which annotations to return. If set to "both", annotations will be applied as if id_time and ts_id were both set.

required

Returns:

Type Description
DataFrame

A Pandas DataFrame containing the selected annotations.

Source code in cesnet_tszoo\benchmarks.py
102
103
104
105
106
107
108
109
110
111
112
113
def get_annotations(self, on: AnnotationType | Literal["id_time", "ts_id", "both"]) -> pd.DataFrame:
    """ 
    Returns the annotations as a Pandas [`DataFrame`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html).

    Parameters:
        on: Specifies which annotations to return. If set to `"both"`, annotations will be applied as if `id_time` and `ts_id` were both set.         

    Returns:
        A Pandas DataFrame containing the selected annotations.      
    """

    return self.dataset.get_annotations(on)

get_config

get_config() -> SeriesBasedConfig | TimeBasedConfig | DisjointTimeBasedConfig

Returns config made for this benchmark.

Source code in cesnet_tszoo\benchmarks.py
51
52
53
54
def get_config(self) -> SeriesBasedConfig | TimeBasedConfig | DisjointTimeBasedConfig:
    """Returns config made for this benchmark. """

    return self.config

get_dataset

get_dataset(check_errors: bool = False) -> TimeBasedCesnetDataset | SeriesBasedCesnetDataset | DisjointTimeBasedCesnetDataset

Returns dataset without initializing it.

Parameters:

Name Type Description Default
check_errors bool

Whether to validate if dataset is not corrupted. Default: False

False

Returns:

Type Description
TimeBasedCesnetDataset | SeriesBasedCesnetDataset | DisjointTimeBasedCesnetDataset

Returns dataset used for this benchmark.

Source code in cesnet_tszoo\benchmarks.py
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
def get_dataset(self, check_errors: bool = False) -> TimeBasedCesnetDataset | SeriesBasedCesnetDataset | DisjointTimeBasedCesnetDataset:
    """Returns dataset without initializing it.

    Parameters:
        check_errors: Whether to validate if dataset is not corrupted. `Default: False`

    Returns:
        Returns dataset used for this benchmark.
    """

    if check_errors:
        self.dataset.check_errors()

    return self.dataset

get_initialized_dataset

get_initialized_dataset(display_config_details: Optional[Literal['text', 'diagram']] = 'text', check_errors: bool = False, workers: Literal['config'] | int = 'config') -> TimeBasedCesnetDataset | SeriesBasedCesnetDataset | DisjointTimeBasedCesnetDataset

Returns dataset with intialized sets, transformers, fillers etc..

This method uses following config attributes:

Dataset config Description
init_workers Specifies the number of workers to use for initialization. Applied when workers = "config".
partial_fit_initialized_transformers Determines whether initialized transformers should be partially fitted on the training data.
nan_threshold Filters out time series with missing values exceeding the specified threshold.

Parameters:

Name Type Description Default
display_config_details Optional[Literal['text', 'diagram']]

Flag indicating whether to display the configuration values after initialization. Default: True

'text'
check_errors bool

Whether to validate if dataset is not corrupted. Default: False

False
workers Literal['config'] | int

The number of workers to use during initialization. Default: "config"

'config'

Returns:

Type Description
TimeBasedCesnetDataset | SeriesBasedCesnetDataset | DisjointTimeBasedCesnetDataset

Returns initialized dataset.

Source code in cesnet_tszoo\benchmarks.py
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
def get_initialized_dataset(self, display_config_details: Optional[Literal["text", "diagram"]] = "text", check_errors: bool = False, workers: Literal["config"] | int = "config") -> TimeBasedCesnetDataset | SeriesBasedCesnetDataset | DisjointTimeBasedCesnetDataset:
    """
    Returns dataset with intialized sets, transformers, fillers etc..

    This method uses following config attributes:

    Dataset config | Description
    -------------- | -----------
    `init_workers` | Specifies the number of workers to use for initialization. Applied when `workers` = "config".
    `partial_fit_initialized_transformers` | Determines whether initialized transformers should be partially fitted on the training data.
    `nan_threshold` | Filters out time series with missing values exceeding the specified threshold.

    Parameters:
        display_config_details: Flag indicating whether to display the configuration values after initialization. `Default: True`   
        check_errors: Whether to validate if dataset is not corrupted. `Default: False`
        workers: The number of workers to use during initialization. `Default: "config"`        

    Returns:
        Returns initialized dataset.
    """

    if display_config_details is not None:
        display_config_details = DisplayType(display_config_details)

    if check_errors:
        self.dataset.check_errors()

    self.dataset.set_dataset_config_and_initialize(self.config, display_config_details, workers)

    return self.dataset
get_related_results() -> pd.DataFrame | None

Returns the related results as a Pandas DataFrame, if they exist.

Returns:

Type Description
DataFrame | None

A Pandas DataFrame containing related results or None if not related results exist.

Source code in cesnet_tszoo\benchmarks.py
115
116
117
118
119
120
121
122
123
def get_related_results(self) -> pd.DataFrame | None:
    """
    Returns the related results as a Pandas [`DataFrame`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html), if they exist. 

    Returns:
        A Pandas DataFrame containing related results or None if not related results exist. 
    """

    return self.related_results

load_benchmark

load_benchmark(identifier: str, data_root: str) -> Benchmark

Load a benchmark using the identifier.

First, it attempts to load the built-in benchmark, if no built-in benchmark with such an identifier exists, it attempts to load a custom benchmark from the "data_root"/tszoo/benchmarks/ directory.

Parameters:

Name Type Description Default
identifier str

The name of the benchmark YAML file.

required
data_root str

Path to the folder where the dataset will be stored. Each database has its own subfolder "data_root"/tszoo/databases/database_name/.

required

Returns:

Type Description
Benchmark

Returns benchmark with config, annotations, dataset and related_results.

Source code in cesnet_tszoo\benchmarks.py
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
def load_benchmark(identifier: str, data_root: str) -> Benchmark:
    """
    Load a benchmark using the identifier.

    First, it attempts to load the built-in benchmark, if no built-in benchmark with such an identifier exists, it attempts to load a custom benchmark from the `"data_root"/tszoo/benchmarks/` directory.

    Parameters:
        identifier: The name of the benchmark YAML file.
        data_root: Path to the folder where the dataset will be stored. Each database has its own subfolder `"data_root"/tszoo/databases/database_name/`.

    Returns:
        Returns benchmark with `config`, `annotations`, `dataset` and `related_results`.
    """

    logger = logging.getLogger("benchmark")

    data_root = os.path.normpath(os.path.expanduser(data_root))

    # For anything else
    if isinstance(identifier, str):
        _, is_built_in = get_benchmark_path_and_whether_it_is_built_in(identifier, data_root, logger)

        if is_built_in:
            logger.info("Built-in benchmark found: %s. Loading it.", identifier)
            return _get_built_in_benchmark(identifier, data_root)
        else:
            logger.info("Custom benchmark found: %s. Loading it.", identifier)
            return _get_custom_benchmark(identifier, data_root)

    else:
        logger.error("Invalid identifier.")
        raise ValueError("Invalid identifier.")