Utilities
This tutorial will look at various utilities.
Only time-based will be used, because all methods work almost the same way for other dataset types.
Note
For every option and more detailed examples refer to Jupyter notebook utilities
Setting logger
CESNET TS-Zoo uses logger, but without setting config below, it wont log anything.
import logging
logging.basicConfig(
level=logging.INFO,
format="[%(asctime)s][%(name)s][%(levelname)s] - %(message)s")
Checking errors
- Goes through all data in dataset to check whether everything is in correct state,
- Can be called when creating dataset or with method
check_errors
on already create dataset. - Recommended to call at least once after download
from cesnet_tszoo.utils.enums import AgreggationType, SourceType
from cesnet_tszoo.datasets import CESNET_TimeSeries24
# Can be called at dataset creation
time_based_dataset = CESNET_TimeSeries24.get_dataset(data_root="/some_directory/", source_type=SourceType.IP_ADDRESSES_SAMPLE, aggregation=AgreggationType.AGG_1_DAY, dataset_type=DatasetType.TIME_BASED, check_errors=True)
# Or after it
time_based_dataset.check_errors()
Dataset details
Displaying all data about selected dataset
Displays available times, time series, features with their default values, additional data provided by dataset.
dataset.display_dataset_details()
Get list of available features
dataset.get_feature_names()
Get numpy array of available dataset time series indices
dataset.get_available_ts_indices()
Get dictionary of related set data
Returns all data in dictionary related to set.
from cesnet_tszoo.configs import TimeBasedConfig
config = TimeBasedConfig(20, train_time_period=0.5)
time_based_dataset.set_dataset_config_and_initialize(config, workers=0, display_config_details=False)
time_based_dataset.get_data_about_set(about=SplitType.TRAIN)
Displaying config details
Can be called when calling set_dataset_config_and_initialize
or after it with display_config
from cesnet_tszoo.configs import TimeBasedConfig
config = TimeBasedConfig(20)
# Can be called during initialization
time_based_dataset.set_dataset_config_and_initialize(config, workers=0, display_config_details=True)
# Or after it
time_based_dataset.display_config()
Plotting
- Uses
Plotly
library. - You can plot specific time series with method
plot
- You can set
ts_id
to any time series id used in config - Config must be set before using
# Features will be taken from config
dataset.plot(ts_id=10, plot_type="line", features="config", feature_per_plot=True, time_format="datetime")
# Specifies features as list... features must be set in used config
dataset.plot(ts_id=10, plot_type="line", features=["n_flows", "n_packets"], feature_per_plot=True, time_format="datetime")
# Can specify single feature... still must be set in used config
dataset.plot(ts_id=10, plot_type="line", features="n_flows", feature_per_plot=True, time_format="datetime")
Get additional data
- You can check whether dataset has additional data, with method
display_dataset_details
.
from cesnet_tszoo.utils.enums import AgreggationType, SourceType
from cesnet_tszoo.datasets import CESNET_TimeSeries24
time_based_dataset = CESNET_TimeSeries24.get_dataset(data_root="/some_directory/", source_type=SourceType.IP_ADDRESSES_SAMPLE, aggregation=AgreggationType.AGG_1_DAY, dataset_type=DatasetType.TIME_BASED, display_details=True)
# Available additional data in CESNET_TimeSeries24 database
time_based_dataset.get_additional_data('ids_relationship')
time_based_dataset.get_additional_data('weekends_and_holidays')
Get fitted transformers
Returns used transformer/s that are used for transforming data.
dataset.get_transformers()