Using anomaly handlers
This tutorial will look at some configuration options for using anomaly handlers.
Only time-based will be used, because all methods work almost the same way for other dataset types.
Note
For every configuration and more detailed examples refer to Jupyter notebook using_anomaly_handlers
Relevant configuration values:
handle_anomalies_with- Defines the anomaly handler used to transform anomalies in the train set.
Anomaly handlers
- Anomaly handlers are implemented as class.
- You can create your own or use built-in one.
- Every time series in train set has its own anomaly handler instance.
- Anomaly handler must implement
fitandtransform_anomalies. - To use anomaly handler, train set must be implemented.
- Anomaly handler will only be used on train set.
- You can change used anomaly handler later with
update_dataset_config_and_initializeorapply_anomaly_handler.
Built-in
To see all built-in anomaly handlers refer to Anomaly handlers.
from cesnet_tszoo.utils.enums import AnomalyHandlerType
from cesnet_tszoo.configs import TimeBasedConfig
config = TimeBasedConfig(ts_ids=500, train_time_period=0.5, val_time_period=0.2, test_time_period=0.1, features_to_take=['n_flows', 'n_packets'],
handle_anomalies_with=AnomalyHandlerType.Z_SCORE, nan_threshold=0.5, random_state=1500)
# Call on time-based dataset to use created config
time_based_dataset.set_dataset_config_and_initialize(config)
Or later with:
time_based_dataset.update_dataset_config_and_initialize(handle_anomalies_with=AnomalyHandlerType.Z_SCORE, workers=0)
# Or
time_based_dataset.apply_anomaly_handler(handle_anomalies_with=AnomalyHandlerType.Z_SCORE, workers=0)
Custom
You can create your own custom anomaly handler. It is recommended to derive from 'AnomalyHandler' base class.
To check AnomalyHandler base class refer to AnomalyHandler
import numpy as np
import warnings
from cesnet_tszoo.utils.anomaly_handler import AnomalyHandler
from cesnet_tszoo.configs import TimeBasedConfig
class CustomAnomalyHandler(AnomalyHandler):
def __init__(self):
self.lower_bound = {}
self.upper_bound = {}
def fit(self, data: np.ndarray) -> None:
warnings.filterwarnings("ignore")
for name in data.dtype.names:
current_data = data[name]
q25, q75 = np.nanpercentile(current_data, [25, 75], axis=0)
iqr = q75 - q25
self.lower_bound[name] = q25 - 1.5 * iqr
self.upper_bound[name] = q75 + 1.5 * iqr
warnings.filterwarnings("always")
def transform_anomalies(self, data: np.ndarray):
for name in data.dtype.names:
lower_bound = self.lower_bound[name]
upper_bound = self.upper_bound[name]
current_data = data[name]
lb_broadcast = np.broadcast_to(lower_bound, current_data.shape)
ub_broadcast = np.broadcast_to(upper_bound, current_data.shape)
mask_lower = current_data < lb_broadcast
mask_upper = current_data > ub_broadcast
current_data[mask_lower] = lb_broadcast[mask_lower]
current_data[mask_upper] = ub_broadcast[mask_upper]
config = TimeBasedConfig(ts_ids=500, train_time_period=0.5, val_time_period=0.2, test_time_period=0.1, features_to_take=['n_flows', 'n_packets'],
handle_anomalies_with=CustomAnomalyHandler, nan_threshold=0.5, random_state=1500)
time_based_dataset.set_dataset_config_and_initialize(config)
Or later with:
time_based_dataset.update_dataset_config_and_initialize(handle_anomalies_with=CustomAnomalyHandler, workers=0)
# Or
time_based_dataset.apply_anomaly_handler(handle_anomalies_with=CustomAnomalyHandler, workers=0)
Changing when is anomaly handler applied
- You can change when is a anomaly handler applied with
preprocess_orderparameter
from cesnet_tszoo.utils.utils.enums import AnomalyHandlerType
from cesnet_tszoo.configs import TimeBasedConfig
config = TimeBasedConfig(ts_ids=500, train_time_period=0.5, val_time_period=0.2, test_time_period=0.1, features_to_take=['n_flows', 'n_packets'],
handle_anomalies_with=AnomalyHandlerType.Z_SCORE, nan_threshold=0.5, random_state=1500, preprocess_order=["handling_anomalies", "filling_gaps", "transforming"])
# Call on dataset to use created config
time_based_dataset.set_dataset_config_and_initialize(config)
Or later with:
time_based_dataset.update_dataset_config_and_initialize(preprocess_order=["filling_gaps", "handling_anomalies", "transforming"], workers=0)
# Or
time_based_dataset.set_preprocess_order(preprocess_order=["filling_gaps", "handling_anomalies", "transforming"], workers=0)