Utilities
This tutorial will look at various utilities.
Only time-based will be used, because all methods work almost the same way for series-based.
Note
For every option and more detailed examples refer to Jupyter notebook utilities
Setting logger
CESNET TS-Zoo uses logger, but without setting config below, it wont log anything.
import logging
logging.basicConfig(
level=logging.INFO,
format="[%(asctime)s][%(name)s][%(levelname)s] - %(message)s")
Checking errors
- Goes through all data in dataset to check whether everything is in correct state,
- Can be called when creating dataset or with method
check_errors
on already create dataset. - Recommended to call at least once after download
from cesnet_tszoo.utils.enums import AgreggationType, SourceType
from cesnet_tszoo.datasets import CESNET_TimeSeries24
# Can be called at dataset creation
time_based_dataset = CESNET_TimeSeries24.get_dataset(data_root="/some_directory/", source_type=SourceType.IP_ADDRESSES_SAMPLE, aggregation=AgreggationType.AGG_1_DAY, is_series_based=False, check_errors=True)
# Or after it
time_based_dataset.check_errors()
Dataset details
Displaying all data about selected dataset
Displays available times, time series, features with their default values, additional data provided by dataset.
dataset.display_dataset_details()
Get list of available features
dataset.get_feature_names()
Get numpy array of available dataset time series indices
dataset.get_available_ts_indices()
Get dictionary of related set data
Returns all data in dictionary related to set.
from cesnet_tszoo.configs import TimeBasedConfig
config = TimeBasedConfig(20, train_time_period=0.5)
time_based_dataset.set_dataset_config_and_initialize(config, workers=0, display_config_details=False)
time_based_dataset.get_data_about_set(about=SplitType.TRAIN)
Displaying config details
Can be called when calling set_dataset_config_and_initialize
or after it with display_config
from cesnet_tszoo.configs import TimeBasedConfig
config = TimeBasedConfig(20)
# Can be called during initialization
time_based_dataset.set_dataset_config_and_initialize(config, workers=0, display_config_details=True)
# Or after it
time_based_dataset.display_config()
Plotting
- Uses
Plotly
library. - You can plot specific time series with method
plot
- You can set
ts_id
to any time series id used in config - Plot will always contains time period of all set
- Config must be set before using
# Features will be taken from config
dataset.plot(ts_id=10, plot_type="line", features="config", feature_per_plot=True, time_format="datetime", use_scalers=True)
# Specifies features as list... features must be set in used config
dataset.plot(ts_id=10, plot_type="line", features=["n_flows", "n_packets"], feature_per_plot=True, time_format="datetime", use_scalers=True)
# Can specify single feature... still must be set in used config
dataset.plot(ts_id=10, plot_type="line", features="n_flows", feature_per_plot=True, time_format="datetime", use_scalers=True)
Get additional data
- You can check whether dataset has additional data, with method
display_dataset_details
.
from cesnet_tszoo.utils.enums import AgreggationType, SourceType
from cesnet_tszoo.datasets import CESNET_TimeSeries24
time_based_dataset = CESNET_TimeSeries24.get_dataset(data_root="/some_directory/", source_type=SourceType.IP_ADDRESSES_SAMPLE, aggregation=AgreggationType.AGG_1_DAY, is_series_based=False, display_details=True)
# Available additional data in CESNET_TimeSeries24 database
time_based_dataset.get_additional_data('ids_relationship')
time_based_dataset.get_additional_data('weekends_and_holidays')
Get fitted scalers
Returns used scaler/s that are used for transforming data.
dataset.get_scalers()