Used as wrapper for imported dataset
, config
, annotations
and related_results
.
Intended usage:
For time-based:
When using TimeBasedCesnetDataset
(dataset_type
= DatasetType.TIME_BASED
):
- Create an instance of the dataset with the desired data root by calling
get_dataset
. This will download the dataset if it has not been previously downloaded and return instance of dataset.
- Create an instance of
TimeBasedConfig
and set it using set_dataset_config_and_initialize
.
This initializes the dataset, including data splitting (train/validation/test), fitting transformers (if needed), selecting features, and more. This is cached for later use.
- Use
get_train_dataloader
/get_train_df
/get_train_numpy
to get training data for chosen model.
- Validate the model and perform the hyperparameter optimalization on
get_val_dataloader
/get_val_df
/get_val_numpy
.
- Evaluate the model on
get_test_dataloader
/get_test_df
/get_test_numpy
.
When using SeriesBasedCesnetDataset
(dataset_type
= DatasetType.SERIES_BASED
):
- Create an instance of the dataset with the desired data root by calling
get_dataset
. This will download the dataset if it has not been previously downloaded and return instance of dataset.
- Create an instance of
SeriesBasedConfig
and set it using set_dataset_config_and_initialize
.
This initializes the dataset, including data splitting (train/validation/test), fitting transformers (if needed), selecting features, and more. This is cached for later use.
- Use
get_train_dataloader
/get_train_df
/get_train_numpy
to get training data for chosen model.
- Validate the model and perform the hyperparameter optimalization on
get_val_dataloader
/get_val_df
/get_val_numpy
.
- Evaluate the model on
get_test_dataloader
/get_test_df
/get_test_numpy
.
When using DisjointTimeBasedCesnetDataset
(dataset_type
= DatasetType.DISJOINT_TIME_BASED
):
- Create an instance of the dataset with the desired data root by calling
get_dataset
. This will download the dataset if it has not been previously downloaded and return instance of dataset.
- Create an instance of
DisjointTimeBasedConfig
and set it using set_dataset_config_and_initialize
.
This initializes the dataset, including data splitting (train/validation/test), fitting transformers (if needed), selecting features, and more. This is cached for later use.
- Use
get_train_dataloader
/get_train_df
/get_train_numpy
to get training data for chosen model.
- Validate the model and perform the hyperparameter optimalization on
get_val_dataloader
/get_val_df
/get_val_numpy
.
- Evaluate the model on
get_test_dataloader
/get_test_df
/get_test_numpy
.
You can create custom time-based benchmarks with save_benchmark
, series-based benchmarks with save_benchmark
or disjoint-time-based with save_benchmark
.
They will be saved to "data_root"/tszoo/benchmarks/
directory, where data_root
was set when you created instance of dataset.
Source code in cesnet_tszoo\benchmarks.py
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139 | class Benchmark:
"""
Used as wrapper for imported `dataset`, `config`, `annotations` and `related_results`.
**Intended usage:**
For time-based:
When using [`TimeBasedCesnetDataset`][cesnet_tszoo.datasets.time_based_cesnet_dataset.TimeBasedCesnetDataset] (`dataset_type` = `DatasetType.TIME_BASED`):
1. Create an instance of the dataset with the desired data root by calling [`get_dataset`][cesnet_tszoo.datasets.cesnet_database.CesnetDatabase.get_dataset]. This will download the dataset if it has not been previously downloaded and return instance of dataset.
2. Create an instance of [`TimeBasedConfig`][cesnet_tszoo.configs.time_based_config.TimeBasedConfig] and set it using [`set_dataset_config_and_initialize`][cesnet_tszoo.datasets.time_based_cesnet_dataset.TimeBasedCesnetDataset.set_dataset_config_and_initialize].
This initializes the dataset, including data splitting (train/validation/test), fitting transformers (if needed), selecting features, and more. This is cached for later use.
3. Use [`get_train_dataloader`][cesnet_tszoo.datasets.time_based_cesnet_dataset.TimeBasedCesnetDataset.get_train_dataloader]/[`get_train_df`][cesnet_tszoo.datasets.time_based_cesnet_dataset.TimeBasedCesnetDataset.get_train_df]/[`get_train_numpy`][cesnet_tszoo.datasets.time_based_cesnet_dataset.TimeBasedCesnetDataset.get_train_numpy] to get training data for chosen model.
4. Validate the model and perform the hyperparameter optimalization on [`get_val_dataloader`][cesnet_tszoo.datasets.time_based_cesnet_dataset.TimeBasedCesnetDataset.get_val_dataloader]/[`get_val_df`][cesnet_tszoo.datasets.time_based_cesnet_dataset.TimeBasedCesnetDataset.get_val_df]/[`get_val_numpy`][cesnet_tszoo.datasets.time_based_cesnet_dataset.TimeBasedCesnetDataset.get_val_numpy].
5. Evaluate the model on [`get_test_dataloader`][cesnet_tszoo.datasets.time_based_cesnet_dataset.TimeBasedCesnetDataset.get_test_dataloader]/[`get_test_df`][cesnet_tszoo.datasets.time_based_cesnet_dataset.TimeBasedCesnetDataset.get_test_df]/[`get_test_numpy`][cesnet_tszoo.datasets.time_based_cesnet_dataset.TimeBasedCesnetDataset.get_test_numpy].
When using [`SeriesBasedCesnetDataset`][cesnet_tszoo.datasets.series_based_cesnet_dataset.SeriesBasedCesnetDataset] (`dataset_type` = `DatasetType.SERIES_BASED`):
1. Create an instance of the dataset with the desired data root by calling [`get_dataset`][cesnet_tszoo.datasets.cesnet_database.CesnetDatabase.get_dataset]. This will download the dataset if it has not been previously downloaded and return instance of dataset.
2. Create an instance of [`SeriesBasedConfig`][cesnet_tszoo.configs.series_based_config.SeriesBasedConfig] and set it using [`set_dataset_config_and_initialize`][cesnet_tszoo.datasets.series_based_cesnet_dataset.SeriesBasedCesnetDataset.set_dataset_config_and_initialize].
This initializes the dataset, including data splitting (train/validation/test), fitting transformers (if needed), selecting features, and more. This is cached for later use.
3. Use [`get_train_dataloader`][cesnet_tszoo.datasets.series_based_cesnet_dataset.SeriesBasedCesnetDataset.get_train_dataloader]/[`get_train_df`][cesnet_tszoo.datasets.series_based_cesnet_dataset.SeriesBasedCesnetDataset.get_train_df]/[`get_train_numpy`][cesnet_tszoo.datasets.series_based_cesnet_dataset.SeriesBasedCesnetDataset.get_train_numpy] to get training data for chosen model.
4. Validate the model and perform the hyperparameter optimalization on [`get_val_dataloader`][cesnet_tszoo.datasets.series_based_cesnet_dataset.SeriesBasedCesnetDataset.get_val_dataloader]/[`get_val_df`][cesnet_tszoo.datasets.series_based_cesnet_dataset.SeriesBasedCesnetDataset.get_val_df]/[`get_val_numpy`][cesnet_tszoo.datasets.series_based_cesnet_dataset.SeriesBasedCesnetDataset.get_val_numpy].
5. Evaluate the model on [`get_test_dataloader`][cesnet_tszoo.datasets.series_based_cesnet_dataset.SeriesBasedCesnetDataset.get_test_dataloader]/[`get_test_df`][cesnet_tszoo.datasets.series_based_cesnet_dataset.SeriesBasedCesnetDataset.get_test_df]/[`get_test_numpy`][cesnet_tszoo.datasets.series_based_cesnet_dataset.SeriesBasedCesnetDataset.get_test_numpy].
When using [`DisjointTimeBasedCesnetDataset`][cesnet_tszoo.datasets.disjoint_time_based_cesnet_dataset.DisjointTimeBasedCesnetDataset] (`dataset_type` = `DatasetType.DISJOINT_TIME_BASED`):
1. Create an instance of the dataset with the desired data root by calling [`get_dataset`][cesnet_tszoo.datasets.cesnet_database.CesnetDatabase.get_dataset]. This will download the dataset if it has not been previously downloaded and return instance of dataset.
2. Create an instance of [`DisjointTimeBasedConfig`][cesnet_tszoo.configs.disjoint_time_based_config.DisjointTimeBasedConfig] and set it using [`set_dataset_config_and_initialize`][cesnet_tszoo.datasets.disjoint_time_based_cesnet_dataset.DisjointTimeBasedCesnetDataset.set_dataset_config_and_initialize].
This initializes the dataset, including data splitting (train/validation/test), fitting transformers (if needed), selecting features, and more. This is cached for later use.
3. Use [`get_train_dataloader`][cesnet_tszoo.datasets.disjoint_time_based_cesnet_dataset.DisjointTimeBasedCesnetDataset.get_train_dataloader]/[`get_train_df`][cesnet_tszoo.datasets.disjoint_time_based_cesnet_dataset.DisjointTimeBasedCesnetDataset.get_train_df]/[`get_train_numpy`][cesnet_tszoo.datasets.disjoint_time_based_cesnet_dataset.DisjointTimeBasedCesnetDataset.get_train_numpy] to get training data for chosen model.
4. Validate the model and perform the hyperparameter optimalization on [`get_val_dataloader`][cesnet_tszoo.datasets.disjoint_time_based_cesnet_dataset.DisjointTimeBasedCesnetDataset.get_val_dataloader]/[`get_val_df`][cesnet_tszoo.datasets.disjoint_time_based_cesnet_dataset.DisjointTimeBasedCesnetDataset.get_val_df]/[`get_val_numpy`][cesnet_tszoo.datasets.disjoint_time_based_cesnet_dataset.DisjointTimeBasedCesnetDataset.get_val_numpy].
5. Evaluate the model on [`get_test_dataloader`][cesnet_tszoo.datasets.disjoint_time_based_cesnet_dataset.DisjointTimeBasedCesnetDataset.get_test_dataloader]/[`get_test_df`][cesnet_tszoo.datasets.disjoint_time_based_cesnet_dataset.DisjointTimeBasedCesnetDataset.get_test_df]/[`get_test_numpy`][cesnet_tszoo.datasets.disjoint_time_based_cesnet_dataset.DisjointTimeBasedCesnetDataset.get_test_numpy].
You can create custom time-based benchmarks with [`save_benchmark`][cesnet_tszoo.datasets.time_based_cesnet_dataset.TimeBasedCesnetDataset.save_benchmark], series-based benchmarks with [`save_benchmark`][cesnet_tszoo.datasets.series_based_cesnet_dataset.SeriesBasedCesnetDataset.save_benchmark] or disjoint-time-based with [`save_benchmark`][cesnet_tszoo.datasets.disjoint_time_based_cesnet_dataset.DisjointTimeBasedCesnetDataset.save_benchmark].
They will be saved to `"data_root"/tszoo/benchmarks/` directory, where `data_root` was set when you created instance of dataset.
"""
def __init__(self, config: DatasetConfig, dataset: CesnetDataset, description: str = None):
self.config = config
self.dataset = dataset
self.description = description
self.related_results = None
self.logger = logging.getLogger("benchmark")
def get_config(self) -> SeriesBasedConfig | TimeBasedConfig | DisjointTimeBasedConfig:
"""Returns config made for this benchmark. """
return self.config
def get_initialized_dataset(self, display_config_details: bool = True, check_errors: bool = False, workers: Literal["config"] | int = "config") -> TimeBasedCesnetDataset | SeriesBasedCesnetDataset | DisjointTimeBasedCesnetDataset:
"""
Returns dataset with intialized sets, transformers, fillers etc..
This method uses following config attributes:
| Dataset config | Description |
| --------------------------------- | ---------------------------------------------------------------------------------------------- |
| `init_workers` | Specifies the number of workers to use for initialization. Applied when `workers` = "config". |
| `partial_fit_initialized_transformers` | Determines whether initialized transformers should be partially fitted on the training data. |
| `nan_threshold` | Filters out time series with missing values exceeding the specified threshold. |
Parameters:
display_config_details: Flag indicating whether to display the configuration values after initialization. `Default: True`
check_errors: Whether to validate if dataset is not corrupted. `Default: False`
workers: The number of workers to use during initialization. `Default: "config"`
Returns:
Returns initialized dataset.
"""
if check_errors:
self.dataset.check_errors()
self.dataset.set_dataset_config_and_initialize(self.config, display_config_details, workers)
return self.dataset
def get_dataset(self, check_errors: bool = False) -> TimeBasedCesnetDataset | SeriesBasedCesnetDataset | DisjointTimeBasedCesnetDataset:
"""Returns dataset without initializing it.
Parameters:
check_errors: Whether to validate if dataset is not corrupted. `Default: False`
Returns:
Returns dataset used for this benchmark.
"""
if check_errors:
self.dataset.check_errors()
return self.dataset
def get_annotations(self, on: AnnotationType | Literal["id_time", "ts_id", "both"]) -> pd.DataFrame:
"""
Returns the annotations as a Pandas [`DataFrame`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html).
Parameters:
on: Specifies which annotations to return. If set to `"both"`, annotations will be applied as if `id_time` and `ts_id` were both set.
Returns:
A Pandas DataFrame containing the selected annotations.
"""
return self.dataset.get_annotations(on)
def get_related_results(self) -> pd.DataFrame | None:
"""
Returns the related results as a Pandas [`DataFrame`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html), if they exist.
Returns:
A Pandas DataFrame containing related results or None if not related results exist.
"""
return self.related_results
|
get_annotations
get_annotations(on: AnnotationType | Literal['id_time', 'ts_id', 'both']) -> pd.DataFrame
Returns the annotations as a Pandas DataFrame
.
Parameters:
Name |
Type |
Description |
Default |
on
|
AnnotationType | Literal['id_time', 'ts_id', 'both']
|
Specifies which annotations to return. If set to "both" , annotations will be applied as if id_time and ts_id were both set.
|
required
|
Returns:
Type |
Description |
DataFrame
|
A Pandas DataFrame containing the selected annotations.
|
Source code in cesnet_tszoo\benchmarks.py
118
119
120
121
122
123
124
125
126
127
128
129 | def get_annotations(self, on: AnnotationType | Literal["id_time", "ts_id", "both"]) -> pd.DataFrame:
"""
Returns the annotations as a Pandas [`DataFrame`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html).
Parameters:
on: Specifies which annotations to return. If set to `"both"`, annotations will be applied as if `id_time` and `ts_id` were both set.
Returns:
A Pandas DataFrame containing the selected annotations.
"""
return self.dataset.get_annotations(on)
|
get_config
get_config() -> SeriesBasedConfig | TimeBasedConfig | DisjointTimeBasedConfig
Returns config made for this benchmark.
Source code in cesnet_tszoo\benchmarks.py
| def get_config(self) -> SeriesBasedConfig | TimeBasedConfig | DisjointTimeBasedConfig:
"""Returns config made for this benchmark. """
return self.config
|
get_dataset
get_dataset(check_errors: bool = False) -> TimeBasedCesnetDataset | SeriesBasedCesnetDataset | DisjointTimeBasedCesnetDataset
Returns dataset without initializing it.
Parameters:
Name |
Type |
Description |
Default |
check_errors
|
bool
|
Whether to validate if dataset is not corrupted. Default: False
|
False
|
Returns:
Source code in cesnet_tszoo\benchmarks.py
103
104
105
106
107
108
109
110
111
112
113
114
115
116 | def get_dataset(self, check_errors: bool = False) -> TimeBasedCesnetDataset | SeriesBasedCesnetDataset | DisjointTimeBasedCesnetDataset:
"""Returns dataset without initializing it.
Parameters:
check_errors: Whether to validate if dataset is not corrupted. `Default: False`
Returns:
Returns dataset used for this benchmark.
"""
if check_errors:
self.dataset.check_errors()
return self.dataset
|
get_initialized_dataset
get_initialized_dataset(display_config_details: bool = True, check_errors: bool = False, workers: Literal['config'] | int = 'config') -> TimeBasedCesnetDataset | SeriesBasedCesnetDataset | DisjointTimeBasedCesnetDataset
Returns dataset with intialized sets, transformers, fillers etc..
This method uses following config attributes:
Dataset config |
Description |
init_workers |
Specifies the number of workers to use for initialization. Applied when workers = "config". |
partial_fit_initialized_transformers |
Determines whether initialized transformers should be partially fitted on the training data. |
nan_threshold |
Filters out time series with missing values exceeding the specified threshold. |
Parameters:
Name |
Type |
Description |
Default |
display_config_details
|
bool
|
Flag indicating whether to display the configuration values after initialization. Default: True
|
True
|
check_errors
|
bool
|
Whether to validate if dataset is not corrupted. Default: False
|
False
|
workers
|
Literal['config'] | int
|
The number of workers to use during initialization. Default: "config"
|
'config'
|
Returns:
Source code in cesnet_tszoo\benchmarks.py
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101 | def get_initialized_dataset(self, display_config_details: bool = True, check_errors: bool = False, workers: Literal["config"] | int = "config") -> TimeBasedCesnetDataset | SeriesBasedCesnetDataset | DisjointTimeBasedCesnetDataset:
"""
Returns dataset with intialized sets, transformers, fillers etc..
This method uses following config attributes:
| Dataset config | Description |
| --------------------------------- | ---------------------------------------------------------------------------------------------- |
| `init_workers` | Specifies the number of workers to use for initialization. Applied when `workers` = "config". |
| `partial_fit_initialized_transformers` | Determines whether initialized transformers should be partially fitted on the training data. |
| `nan_threshold` | Filters out time series with missing values exceeding the specified threshold. |
Parameters:
display_config_details: Flag indicating whether to display the configuration values after initialization. `Default: True`
check_errors: Whether to validate if dataset is not corrupted. `Default: False`
workers: The number of workers to use during initialization. `Default: "config"`
Returns:
Returns initialized dataset.
"""
if check_errors:
self.dataset.check_errors()
self.dataset.set_dataset_config_and_initialize(self.config, display_config_details, workers)
return self.dataset
|
get_related_results() -> pd.DataFrame | None
Returns the related results as a Pandas DataFrame
, if they exist.
Returns:
Type |
Description |
DataFrame | None
|
A Pandas DataFrame containing related results or None if not related results exist.
|
Source code in cesnet_tszoo\benchmarks.py
131
132
133
134
135
136
137
138
139 | def get_related_results(self) -> pd.DataFrame | None:
"""
Returns the related results as a Pandas [`DataFrame`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html), if they exist.
Returns:
A Pandas DataFrame containing related results or None if not related results exist.
"""
return self.related_results
|