Skip to content

Using anomaly handlers

This tutorial will look at some configuration options for using anomaly handlers.

Only time-based will be used, because all methods work almost the same way for other dataset types.

Note

For every configuration and more detailed examples refer to Jupyter notebook using_anomaly_handlers

Relevant configuration values:

  • handle_anomalies_with - Defines the anomaly handler used to transform anomalies in the train set.

Anomaly handlers

  • Anomaly handlers are implemented as class.
    • You can create your own or use built-in one.
  • Every time series in train set has its own anomaly handler instance.
  • Anomaly handler must implement fit and transform_anomalies.
  • To use anomaly handler, train set must be implemented.
  • Anomaly handler will only be used on train set.
  • You can change used anomaly handler later with update_dataset_config_and_initialize or apply_anomaly_handler.

Built-in

To see all built-in anomaly handlers refer to Anomaly handlers.

from cesnet_tszoo.utils.enums import AnomalyHandlerType
from cesnet_tszoo.configs import TimeBasedConfig

config = TimeBasedConfig(ts_ids=500, train_time_period=0.5, val_time_period=0.2, test_time_period=0.1, features_to_take=['n_flows', 'n_packets'],
                         handle_anomalies_with=AnomalyHandlerType.Z_SCORE, nan_threshold=0.5, random_state=1500)                                                                           

# Call on time-based dataset to use created config
time_based_dataset.set_dataset_config_and_initialize(config)

Or later with:

time_based_dataset.update_dataset_config_and_initialize(handle_anomalies_with=AnomalyHandlerType.Z_SCORE, workers=0)
# Or
time_based_dataset.apply_anomaly_handler(handle_anomalies_with=AnomalyHandlerType.Z_SCORE, workers=0)

Custom

You can create your own custom anomaly handler. It is recommended to derive from 'AnomalyHandler' base class.

To check AnomalyHandler base class refer to AnomalyHandler

import numpy as np
import warnings

from cesnet_tszoo.utils.anomaly_handler import AnomalyHandler
from cesnet_tszoo.configs import TimeBasedConfig

class CustomAnomalyHandler(AnomalyHandler):
    def __init__(self):
        self.lower_bound = {}
        self.upper_bound = {}

    def fit(self, data: np.ndarray) -> None:

        warnings.filterwarnings("ignore")

        for name in data.dtype.names:
            current_data = data[name]

            q25, q75 = np.nanpercentile(current_data, [25, 75], axis=0)
            iqr = q75 - q25

            self.lower_bound[name] = q25 - 1.5 * iqr
            self.upper_bound[name] = q75 + 1.5 * iqr

        warnings.filterwarnings("always")

    def transform_anomalies(self, data: np.ndarray):

        for name in data.dtype.names:
            lower_bound = self.lower_bound[name]
            upper_bound = self.upper_bound[name]
            current_data = data[name]

            lb_broadcast = np.broadcast_to(lower_bound, current_data.shape)
            ub_broadcast = np.broadcast_to(upper_bound, current_data.shape)

            mask_lower = current_data < lb_broadcast
            mask_upper = current_data > ub_broadcast

            current_data[mask_lower] = lb_broadcast[mask_lower]
            current_data[mask_upper] = ub_broadcast[mask_upper]           

config = TimeBasedConfig(ts_ids=500, train_time_period=0.5, val_time_period=0.2, test_time_period=0.1, features_to_take=['n_flows', 'n_packets'],
                           handle_anomalies_with=CustomAnomalyHandler, nan_threshold=0.5, random_state=1500)                                                                    

time_based_dataset.set_dataset_config_and_initialize(config)

Or later with:

time_based_dataset.update_dataset_config_and_initialize(handle_anomalies_with=CustomAnomalyHandler, workers=0)
# Or
time_based_dataset.apply_anomaly_handler(handle_anomalies_with=CustomAnomalyHandler, workers=0)

Changing when is anomaly handler applied

  • You can change when is a anomaly handler applied with preprocess_order parameter
from cesnet_tszoo.utils.utils.enums import AnomalyHandlerType
from cesnet_tszoo.configs import TimeBasedConfig

config = TimeBasedConfig(ts_ids=500, train_time_period=0.5, val_time_period=0.2, test_time_period=0.1, features_to_take=['n_flows', 'n_packets'],
                           handle_anomalies_with=AnomalyHandlerType.Z_SCORE, nan_threshold=0.5, random_state=1500, preprocess_order=["handling_anomalies", "filling_gaps", "transforming"])

# Call on dataset to use created config
time_based_dataset.set_dataset_config_and_initialize(config)

Or later with:

time_based_dataset.update_dataset_config_and_initialize(preprocess_order=["filling_gaps", "handling_anomalies", "transforming"], workers=0)
# Or
time_based_dataset.set_preprocess_order(preprocess_order=["filling_gaps", "handling_anomalies", "transforming"], workers=0)