Using anomaly handlers
This tutorial will look at some configuration options for using anomaly handlers.
Only time-based will be used, because all methods work almost the same way for other dataset types.
Note
For every configuration and more detailed examples refer to Jupyter notebook using_anomaly_handlers
Relevant configuration values:
handle_anomalies_with
- Defines the anomaly handler used to transform anomalies in the train set.
Anomaly handlers
- Anomaly handlers are implemented as class.
- You can create your own or use built-in one.
- Anomaly handler is applied before
default_values
and fillers took care of missing values. - Every time series in train set has its own anomaly handler instance.
- Anomaly handler must implement
fit
andtransform_anomalies
. - To use anomaly handler, train set must be implemented.
- Anomaly handler will only be used on train set.
- You can change used anomaly handler later with
update_dataset_config_and_initialize
orapply_anomaly_handler
.
Built-in
To see all built-in anomaly handlers refer to Anomaly handlers
.
from cesnet_tszoo.utils.enums import AnomalyHandlerType
from cesnet_tszoo.configs import TimeBasedConfig
config = TimeBasedConfig(ts_ids=500, train_time_period=0.5, val_time_period=0.2, test_time_period=0.1, features_to_take=['n_flows', 'n_packets'],
handle_anomalies_with=AnomalyHandlerType.Z_SCORE, nan_threshold=0.5, random_state=1500)
# Call on time-based dataset to use created config
time_based_dataset.set_dataset_config_and_initialize(config)
Or later with:
time_based_dataset.update_dataset_config_and_initialize(handle_anomalies_with=AnomalyHandlerType.Z_SCORE, workers=0)
# Or
time_based_dataset.apply_anomaly_handler(handle_anomalies_with=AnomalyHandlerType.Z_SCORE, workers=0)
Custom
You can create your own custom anomaly handler. It is recommended to derive from 'AnomalyHandler' base class.
To check AnomalyHandler base class refer to AnomalyHandler
import numpy as np
from cesnet_tszoo.utils.anomaly_handler import AnomalyHandler
from cesnet_tszoo.configs import TimeBasedConfig
class CustomAnomalyHandler(AnomalyHandler):
def __init__(self):
self.lower_bound = None
self.upper_bound = None
self.iqr = None
def fit(self, data: np.ndarray) -> None:
q25, q75 = np.percentile(data, [25, 75], axis=0)
self.iqr = q75 - q25
self.lower_bound = q25 - 1.5 * self.iqr
self.upper_bound = q75 + 1.5 * self.iqr
def transform_anomalies(self, data: np.ndarray) -> np.ndarray:
mask_lower_outliers = data < self.lower_bound
mask_upper_outliers = data > self.upper_bound
data[mask_lower_outliers] = np.take(self.lower_bound, np.where(mask_lower_outliers)[1])
data[mask_upper_outliers] = np.take(self.upper_bound, np.where(mask_upper_outliers)[1])
config = TimeBasedConfig(ts_ids=500, train_time_period=0.5, val_time_period=0.2, test_time_period=0.1, features_to_take=['n_flows', 'n_packets'],
handle_anomalies_with=CustomAnomalyHandler, nan_threshold=0.5, random_state=1500)
time_based_dataset.set_dataset_config_and_initialize(config)
Or later with:
time_based_dataset.update_dataset_config_and_initialize(handle_anomalies_with=CustomAnomalyHandler, workers=0)
# Or
time_based_dataset.apply_anomaly_handler(handle_anomalies_with=CustomAnomalyHandler, workers=0)