Skip to content

Utilities

This tutorial will look at various utilities.

Only time-based will be used, because all methods work almost the same way for series-based.

Note

For every option and more detailed examples refer to Jupyter notebook utilities

Setting logger

CESNET TS-Zoo uses logger, but without setting config below, it wont log anything.

import logging                                                                       

logging.basicConfig(
    level=logging.INFO,
    format="[%(asctime)s][%(name)s][%(levelname)s] - %(message)s")

Checking errors

  • Goes through all data in dataset to check whether everything is in correct state,
  • Can be called when creating dataset or with method check_errors on already create dataset.
  • Recommended to call at least once after download
from cesnet_tszoo.utils.enums import AgreggationType, SourceType
from cesnet_tszoo.datasets import CESNET_TimeSeries24                                                                

# Can be called at dataset creation
time_based_dataset = CESNET_TimeSeries24.get_dataset(data_root="/some_directory/", source_type=SourceType.IP_ADDRESSES_SAMPLE, aggregation=AgreggationType.AGG_1_DAY, is_series_based=False, check_errors=True)

# Or after it
time_based_dataset.check_errors()

Dataset details

Displaying all data about selected dataset

Displays available times, time series, features with their default values, additional data provided by dataset.

dataset.display_dataset_details()

Get list of available features

dataset.get_feature_names()

Get numpy array of available dataset time series indices

dataset.get_available_ts_indices()

Returns all data in dictionary related to set.

from cesnet_tszoo.configs import TimeBasedConfig   

config = TimeBasedConfig(20, train_time_period=0.5)
time_based_dataset.set_dataset_config_and_initialize(config, workers=0, display_config_details=False)

time_based_dataset.get_data_about_set(about=SplitType.TRAIN)

Displaying config details

Can be called when calling set_dataset_config_and_initialize or after it with display_config

from cesnet_tszoo.configs import TimeBasedConfig   

config = TimeBasedConfig(20)

# Can be called during initialization
time_based_dataset.set_dataset_config_and_initialize(config, workers=0, display_config_details=True)

# Or after it
time_based_dataset.display_config()

Plotting

  • Uses Plotly library.
  • You can plot specific time series with method plot
  • You can set ts_id to any time series id used in config
  • Plot will always contains time period of all set
  • Config must be set before using
# Features will be taken from config
dataset.plot(ts_id=10, plot_type="line", features="config", feature_per_plot=True, time_format="datetime", use_scalers=True)

# Specifies features as list... features must be set in used config
dataset.plot(ts_id=10, plot_type="line", features=["n_flows", "n_packets"], feature_per_plot=True, time_format="datetime", use_scalers=True)

# Can specify single feature... still must be set in used config
dataset.plot(ts_id=10, plot_type="line", features="n_flows", feature_per_plot=True, time_format="datetime", use_scalers=True)

Get additional data

  • You can check whether dataset has additional data, with method display_dataset_details.
from cesnet_tszoo.utils.enums import AgreggationType, SourceType
from cesnet_tszoo.datasets import CESNET_TimeSeries24                                                                

time_based_dataset = CESNET_TimeSeries24.get_dataset(data_root="/some_directory/", source_type=SourceType.IP_ADDRESSES_SAMPLE, aggregation=AgreggationType.AGG_1_DAY, is_series_based=False, display_details=True)

# Available additional data in CESNET_TimeSeries24 database
time_based_dataset.get_additional_data('ids_relationship')
time_based_dataset.get_additional_data('weekends_and_holidays')

Get fitted scalers

Returns used scaler/s that are used for transforming data.

dataset.get_scalers()