Using transformers
This tutorial will look at some configuration options for using transformers.
Each dataset type will have its own part because of multiple differences of available configuration values.
TimeBasedCesnetDataset
dataset
Note
For every configuration and more detailed examples refer to Jupyter notebook time_based_using_transformers
Relevant configuration values:
transform_with
- Defines the transformer used to transform the dataset.create_transformer_per_time_series
- If True, a separate transformer is created for each time series.partial_fit_initialized_transformers
- If True, partial fitting on train set is performed when using initiliazed transformers.
Transformers
- Transformers are implemented as class.
- You can create your own or use built-in one.
- Transformer must implement
transform
. - Transformer can implement
inverse_transform
. - Transformers are applied after
default_values
and fillers took care of missing values. - To use transformers, train set must be implemented (unless transformers are already fitted and
partial_fit_initialized_transformers
is False). fit
method on transformer:- must be implemented when
create_transformer_per_time_series
is True and transformers are not already fitted.
- must be implemented when
partial_fit
method on transformer:- must be implemented when
create_transformer_per_time_series
is False or using already fitted transformers withpartial_fit_initialized_transformers
set to True.
- must be implemented when
- You can change used transformer later with
update_dataset_config_and_initialize
orapply_transformer
.
Built-in
To see all built-in transformers refer to Transformers
.
from cesnet_tszoo.utils.enums import TransformerType
from cesnet_tszoo.configs import TimeBasedConfig
config = TimeBasedConfig(ts_ids=[1367, 1368], train_time_period=0.5, val_time_period=0.2, test_time_period=0.1, features_to_take=['n_flows', 'n_packets'],
transform_with=TransformerType.MIN_MAX_SCALER, create_transformer_per_time_series=True)
# Call on time-based dataset to use created config
time_based_dataset.set_dataset_config_and_initialize(config)
Or later with:
time_based_dataset.update_dataset_config_and_initialize(transform_with=TransformerType.MIN_MAX_SCALER, create_transformer_per_time_series=True, partial_fit_initialized_transformers="config", workers=0)
# Or
time_based_dataset.apply_transformer(transform_with=TransformerType.MIN_MAX_SCALER, create_transformer_per_time_series=True, partial_fit_initialized_transformers="config", workers=0)
Custom
You can create your own custom transformer. It is recommended to derive from 'Transformer' base class.
To check Transformer base class refer to Transformer
from cesnet_tszoo.utils.transformer import Transformer
from cesnet_tszoo.configs import TimeBasedConfig
class CustomTransformer(Transformer):
def __init__(self):
super().__init__()
self.max = None
self.min = None
def transform(self, data):
return (data - self.min) / (self.max - self.min)
def fit(self, data):
self.partial_fit(data)
def partial_fit(self, data):
if self.max is None and self.min is None:
self.max = np.max(data, axis=0)
self.min = np.min(data, axis=0)
return
temp_max = np.max(data, axis=0)
temp = np.vstack((self.max, temp_max))
self.max = np.max(temp, axis=0)
temp_min = np.min(data, axis=0)
temp = np.vstack((self.min, temp_min))
self.min = np.min(temp, axis=0)
def inverse_transform(self, transformed_data):
return transformed_data * (self.max - self.min) + self.min
config = TimeBasedConfig(ts_ids=[1367, 1368], train_time_period=0.5, val_time_period=0.2, test_time_period=0.1, features_to_take=['n_flows', 'n_packets'],
transform_with=CustomTransformer, create_transformer_per_time_series=True)
time_based_dataset.set_dataset_config_and_initialize(config)
Or later with:
time_based_dataset.update_dataset_config_and_initialize(transform_with=CustomTransformer, create_transformer_per_time_series=True, partial_fit_initialized_transformers="config", workers=0)
# Or
time_based_dataset.apply_transformer(transform_with=CustomTransformer, create_transformer_per_time_series=True, partial_fit_initialized_transformers="config", workers=0)
Using already fitted transformers
from cesnet_tszoo.configs import TimeBasedConfig
config = TimeBasedConfig(ts_ids=[103, 118], train_time_period=0.5, val_time_period=0.2, test_time_period=0.1, features_to_take=['n_flows', 'n_packets'],
transform_with=list_of_fitted_transformers, create_transformer_per_time_series=True)
# Length of list_of_fitted_transformers must be equal to number of time series in ts_ids
# All transformers in list_of_fitted_transformers must be of same type
config = TimeBasedConfig(ts_ids=[103, 118], train_time_period=0.5, val_time_period=0.2, test_time_period=0.1, features_to_take=['n_flows', 'n_packets'],
transform_with=one_prefitted_transformer, create_transformer_per_time_series=True)
# one_prefitted_transformer must be just one transformer (not a list)
time_based_dataset.set_dataset_config_and_initialize(config)
Getting pre-transform value
- You can use
inverse_transform
for transformers you can get viaget_transformers()
to get pre-transform value. inverse_transformer
expects input as numpy array of shape(times, features)
where features do not contain ids.
from cesnet_tszoo.utils.enums import TransformerType
from cesnet_tszoo.configs import TimeBasedConfig
config = TimeBasedConfig(ts_ids=[1367, 1368], train_time_period=0.5, val_time_period=0.2, test_time_period=0.1, features_to_take=['n_flows', 'n_packets'],
transform_with=TransformerType.MIN_MAX_SCALER, create_transformer_per_time_series=False)
time_based_dataset.set_dataset_config_and_initialize(config)
transformer = time_based_dataset.get_transformers()
data = None
for batch in time_based_dataset.get_train_dataloader():
data = batch[0, :, 2:]
break
transformer.inverse_transform(data)[:10]
DisjointTimeBasedCesnetDataset
dataset
Note
For every configuration and more detailed examples refer to Jupyter notebook disjoint_time_based_using_transformers
Relevant configuration values:
transform_with
- Defines the transformer used to transform the dataset.partial_fit_initialized_transformers
- If True, partial fitting on train set is performed when using initiliazed transformers.
Transformers
- Transformers are implemented as class.
- You can create your own or use built-in one.
- Transformer is applied after
default_values
and fillers took care of missing values. - One transformer is used for all time series.
- Transformer must implement
transform
. - Transformer can implement
inverse_transform
. - Transformer must implement
partial_fit
(unless transformer is already fitted andpartial_fit_initialized_transformers
is False). - To use transformer, train set must be implemented (unless transformer is already fitted and
partial_fit_initialized_transformers
is False). - You can change used transformer later with
update_dataset_config_and_initialize
orapply_transformer
.
Built-in
To see all built-in transformers refer to Transformers
.
from cesnet_tszoo.utils.enums import TransformerType
from cesnet_tszoo.configs import DisjointTimeBasedConfig
config = DisjointTimeBasedConfig(train_ts=500, val_ts=None, test_ts=None, train_time_period=0.5, features_to_take=["n_flows", "n_packets"],
transform_with=TransformerType.MIN_MAX_SCALER, nan_threshold=0.5, random_state=1500)
# Call on disjoint-time-based dataset to use created config
disjoint_dataset.set_dataset_config_and_initialize(config)
Or later with:
disjoint_dataset.update_dataset_config_and_initialize(transform_with=TransformerType.MIN_MAX_SCALER, partial_fit_initialized_transformers="config", workers=0)
# Or
disjoint_dataset.apply_transformer(transform_with=TransformerType.MIN_MAX_SCALER, partial_fit_initialized_transformers="config", workers=0)
Custom
You can create your own custom transformer. It is recommended to derive from 'Transformer' base class.
To check Transformer base class refer to Transformer
from cesnet_tszoo.utils.transformer import Transformer
from cesnet_tszoo.configs import DisjointTimeBasedConfig
class CustomTransformer(Transformer):
def __init__(self):
super().__init__()
self.max = None
self.min = None
def transform(self, data):
return (data - self.min) / (self.max - self.min)
def fit(self, data):
self.partial_fit(data)
def partial_fit(self, data):
if self.max is None and self.min is None:
self.max = np.max(data, axis=0)
self.min = np.min(data, axis=0)
return
temp_max = np.max(data, axis=0)
temp = np.vstack((self.max, temp_max))
self.max = np.max(temp, axis=0)
temp_min = np.min(data, axis=0)
temp = np.vstack((self.min, temp_min))
self.min = np.min(temp, axis=0)
def inverse_transform(self, transformed_data):
return transformed_data * (self.max - self.min) + self.min
config = DisjointTimeBasedConfig(train_ts=500, val_ts=None, test_ts=None, train_time_period=0.5, features_to_take=["n_flows", "n_packets"],
transform_with=CustomTransformer, nan_threshold=0.5, random_state=1500)
disjoint_dataset.set_dataset_config_and_initialize(config)
Or later with:
disjoint_dataset.update_dataset_config_and_initialize(transform_with=CustomTransformer, partial_fit_initialized_transformers="config", workers=0)
# Or
disjoint_dataset.apply_transformer(transform_with=CustomTransformer, partial_fit_initialized_transformers="config", workers=0)
Using already fitted transformers
from cesnet_tszoo.configs import DisjointTimeBasedConfig
config = DisjointTimeBasedConfig(train_ts=500, val_ts=500, test_ts=None, train_time_period=0.5, val_time_period=0.5, features_to_take=["n_flows", "n_packets"],
transform_with=one_prefitted_transformer, nan_threshold=0.5, random_state=999)
# one_prefitted_transformer must be just one transformer (not a list)
time_based_dataset.set_dataset_config_and_initialize(config)
Getting pre-transform value
- You can use
inverse_transform
for transformers you can get viaget_transformers()
to get pre-transform value. inverse_transformer
expects input as numpy array of shape(times, features)
where features do not contain ids.
from cesnet_tszoo.utils.enums import TransformerType
from cesnet_tszoo.configs import DisjointTimeBasedConfig
config = DisjointTimeBasedConfig(train_ts=500, val_ts=None, test_ts=None, train_time_period=0.5, features_to_take=["n_flows", "n_packets"],
transform_with=TransformerType.MIN_MAX_SCALER, nan_threshold=0.5, random_state=1500)
# Call on disjoint-time-based dataset to use created config
disjoint_dataset.set_dataset_config_and_initialize(config)
transformer = disjoint_dataset.get_transformers()
data = None
for batch in disjoint_dataset.get_train_dataloader():
data = batch[0, :, 2:]
break
transformer.inverse_transform(data)[:10]
SeriesBasedCesnetDataset
dataset
Note
For every configuration and more detailed examples refer to Jupyter notebook series_based_using_transformers
Relevant configuration values:
transform_with
- Defines the transformer used to transform the dataset.partial_fit_initialized_transformers
- If True, partial fitting on train set is performed when using initiliazed transformer.
Transformers
- Transformers are implemented as class.
- You can create your own or use built-in one.
- Transformer is applied after
default_values
and fillers took care of missing values. - One transformer is used for all time series.
- Transformer must implement
transform
. - Transformer can implement
inverse_transform
. - Transformer must implement
partial_fit
(unless transformer is already fitted andpartial_fit_initialized_transformers
is False). - To use transformer, train set must be implemented (unless transformer is already fitted and
partial_fit_initialized_transformers
is False). - You can change used transformer later with
update_dataset_config_and_initialize
orapply_transformer
.
Built-in
To see all built-in transformers refer to Transformers
.
from cesnet_tszoo.utils.enums import TransformerType
from cesnet_tszoo.configs import SeriesBasedConfig
config = SeriesBasedConfig(time_period=0.5, train_ts=500, features_to_take=["n_flows", "n_packets"],
transform_with=TransformerType.MIN_MAX_SCALER, nan_threshold=0.5, random_state=1500)
# Call on series-based dataset to use created config
series_based_dataset.set_dataset_config_and_initialize(config)
Or later with:
series_based_dataset.update_dataset_config_and_initialize(transform_with=TransformerType.MIN_MAX_SCALER, partial_fit_initialized_transformers="config", workers=0)
# Or
series_based_dataset.apply_transformer(transform_with=TransformerType.MIN_MAX_SCALER, partial_fit_initialized_transformers="config", workers=0)
Custom
You can create your own custom transformer. It is recommended to derive from 'Transformer' base class.
To check Transformer base class refer to Transformer
from cesnet_tszoo.utils.transformer import Transformer
from cesnet_tszoo.configs import SeriesBasedConfig
class CustomTransformer(Transformer):
def __init__(self):
super().__init__()
self.max = None
self.min = None
def transform(self, data):
return (data - self.min) / (self.max - self.min)
def fit(self, data):
self.partial_fit(data)
def partial_fit(self, data):
if self.max is None and self.min is None:
self.max = np.max(data, axis=0)
self.min = np.min(data, axis=0)
return
temp_max = np.max(data, axis=0)
temp = np.vstack((self.max, temp_max))
self.max = np.max(temp, axis=0)
temp_min = np.min(data, axis=0)
temp = np.vstack((self.min, temp_min))
self.min = np.min(temp, axis=0)
def inverse_transform(self, transformed_data):
return transformed_data * (self.max - self.min) + self.min
config = SeriesBasedConfig(time_period=0.5, train_ts=500, features_to_take=["n_flows", "n_packets"],
transform_with=CustomTransformer, nan_threshold=0.5, random_state=1500)
series_based_dataset.set_dataset_config_and_initialize(config)
Or later with:
series_based_dataset.update_dataset_config_and_initialize(transform_with=CustomTransformer, partial_fit_initialized_transformers="config", workers=0)
# Or
series_based_dataset.apply_transformer(transform_with=CustomTransformer, partial_fit_initialized_transformers="config", workers=0)
Using already fitted transformers
from cesnet_tszoo.configs import SeriesBasedConfig
config = SeriesBasedConfig(time_period=0.5, val_ts=500, features_to_take=["n_flows", "n_packets"],
transform_with=fitted_transformer, nan_threshold=0.5, random_state=999)
# fitted_transformer must be just one transformer (not a list)
series_based_dataset.set_dataset_config_and_initialize(config)
Getting pre-transform value
- You can use
inverse_transform
for transformers you can get viaget_transformers()
to get pre-transform value. inverse_transformer
expects input as numpy array of shape(times, features)
where features do not contain ids.
from cesnet_tszoo.utils.enums import TransformerType
from cesnet_tszoo.configs import SeriesBasedConfig
config = SeriesBasedConfig(time_period=0.5, train_ts=500, features_to_take=["n_flows", "n_packets"],
transform_with=TransformerType.MIN_MAX_SCALER, nan_threshold=0.5, random_state=1500)
# Call on series-based dataset to use created config
series_based_dataset.set_dataset_config_and_initialize(config)
transformer = series_based_dataset.get_transformers()
data = None
for batch in series_based_dataset.get_train_dataloader():
data = batch[0, :, 2:]
break
transformer.inverse_transform(data)[:10]