Benchmark
cesnet_tszoo.benchmarks
Benchmark
Used as wrapper for imported dataset, config, annotations and related_results.
Intended usage:
- Call
load_benchmarkwith the desired benchmark. You can use your own saved benchmark or you can use already built-in one. This will download the dataset and annotations (if available) if they have not been previously downloaded. - Retrieve the initialized dataset using
get_initialized_dataset. This will provide a dataset that is ready to use. Check beforehand what type of dataset is returned. - Use
get_train_dataloader/get_train_df/get_train_numpyto get training data for chosen model. - [Optional] Modify used preprocessing steps with
update_dataset_config_and_initialize. - Validate the model and perform the hyperparameter optimalization on
get_val_dataloader/get_val_df/get_val_numpy. - Evaluate the model on
get_test_dataloader/get_test_df/get_test_numpy.
You can create custom benchmarks with save_benchmark.
They will be saved to "data_root"/tszoo/benchmarks/ directory, where data_root was passed as parameter to load_benchmark.
Above steps are practically the same for all dataset types, but there can be small differences is method parameters. Check each of them for info about that.
Source code in cesnet_tszoo\benchmarks.py
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 | |
get_annotations
get_annotations(on: AnnotationType | Literal['id_time', 'ts_id', 'both']) -> pd.DataFrame
Returns the annotations as a Pandas DataFrame.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
on
|
AnnotationType | Literal['id_time', 'ts_id', 'both']
|
Specifies which annotations to return. If set to |
required |
Returns:
| Type | Description |
|---|---|
DataFrame
|
A Pandas DataFrame containing the selected annotations. |
Source code in cesnet_tszoo\benchmarks.py
102 103 104 105 106 107 108 109 110 111 112 113 | |
get_config
get_config() -> SeriesBasedConfig | TimeBasedConfig | DisjointTimeBasedConfig
Returns config made for this benchmark.
Source code in cesnet_tszoo\benchmarks.py
51 52 53 54 | |
get_dataset
get_dataset(check_errors: bool = False) -> TimeBasedCesnetDataset | SeriesBasedCesnetDataset | DisjointTimeBasedCesnetDataset
Returns dataset without initializing it.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
check_errors
|
bool
|
Whether to validate if dataset is not corrupted. |
False
|
Returns:
| Type | Description |
|---|---|
TimeBasedCesnetDataset | SeriesBasedCesnetDataset | DisjointTimeBasedCesnetDataset
|
Returns dataset used for this benchmark. |
Source code in cesnet_tszoo\benchmarks.py
87 88 89 90 91 92 93 94 95 96 97 98 99 100 | |
get_initialized_dataset
get_initialized_dataset(display_config_details: Optional[Literal['text', 'diagram']] = 'text', check_errors: bool = False, workers: Literal['config'] | int = 'config') -> TimeBasedCesnetDataset | SeriesBasedCesnetDataset | DisjointTimeBasedCesnetDataset
Returns dataset with intialized sets, transformers, fillers etc..
This method uses following config attributes:
| Dataset config | Description |
|---|---|
init_workers |
Specifies the number of workers to use for initialization. Applied when workers = "config". |
partial_fit_initialized_transformers |
Determines whether initialized transformers should be partially fitted on the training data. |
nan_threshold |
Filters out time series with missing values exceeding the specified threshold. |
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
display_config_details
|
Optional[Literal['text', 'diagram']]
|
Flag indicating whether to display the configuration values after initialization. |
'text'
|
check_errors
|
bool
|
Whether to validate if dataset is not corrupted. |
False
|
workers
|
Literal['config'] | int
|
The number of workers to use during initialization. |
'config'
|
Returns:
| Type | Description |
|---|---|
TimeBasedCesnetDataset | SeriesBasedCesnetDataset | DisjointTimeBasedCesnetDataset
|
Returns initialized dataset. |
Source code in cesnet_tszoo\benchmarks.py
56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 | |
get_related_results
get_related_results() -> pd.DataFrame | None
Returns the related results as a Pandas DataFrame, if they exist.
Returns:
| Type | Description |
|---|---|
DataFrame | None
|
A Pandas DataFrame containing related results or None if not related results exist. |
Source code in cesnet_tszoo\benchmarks.py
115 116 117 118 119 120 121 122 123 | |
load_benchmark
load_benchmark(identifier: str, data_root: str) -> Benchmark
Load a benchmark using the identifier.
First, it attempts to load the built-in benchmark, if no built-in benchmark with such an identifier exists, it attempts to load a custom benchmark from the "data_root"/tszoo/benchmarks/ directory.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
identifier
|
str
|
The name of the benchmark YAML file. |
required |
data_root
|
str
|
Path to the folder where the dataset will be stored. Each database has its own subfolder |
required |
Returns:
| Type | Description |
|---|---|
Benchmark
|
Returns benchmark with |
Source code in cesnet_tszoo\benchmarks.py
126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 | |