Available models

All models have the following behavior. When the weights parameter is specified, pre-trained weights will be downloaded and cached in the model_dir folder. The returned model will be in the evaluation mode.

30pktTCNET_256

An example of how to feed data into this model is provided in a Jupyter notebook with multi-dataset evaluation - cross_dataset_embedding_function.ipynb.

models.model_30pktTCNET_256

model_30pktTCNET_256(weights=None, model_dir=None)

A single-modal neural network processing sequences of 30 packets and outputting 256-dimensional flow embeddings. For fine-tuning, consider using just the backbone_model attribute (an instance of Multimodal_CESNET_Enhanced) of the returned model.

Parameters:

Name	Type	Description	Default
`weights`	`Optional[Model_30pktTCNET_256_Weights]`	If provided, the model will be initialized with these weights.	`None`
`model_dir`	`Optional[str]`	If weights are provided, this folder will be used to store the weights.	`None`

Source code in cesnet_models\models.py

def model_30pktTCNET_256(weights: Optional[Model_30pktTCNET_256_Weights] = None,
                         model_dir: Optional[str] = None) -> EmbeddingModel:
    """
    A single-modal neural network processing sequences of 30 packets and outputting 256-dimensional flow embeddings.
    For fine-tuning, consider using just the `backbone_model` attribute (an instance of Multimodal_CESNET_Enhanced) of the returned model.

    Parameters:
        weights: If provided, the model will be initialized with these weights.
        model_dir: If weights are provided, this folder will be used to store the weights.
    """
    architecture_params = {
        "use_mlp_flowstats": False,
        "init_weights": True,
        "cnn_ppi_stem_type": StemType.EMBED,
        "pe_size_embedding": 20,
        "pe_size_include_dir": False,
        "pe_size_init": PacketSizeInitEnum.PLE,
        "pe_size_ple_bin_size": 100,
        "pe_ipt_processing": ProcessIPT.EMBED,
        "pe_ipt_embedding": 10,
        "pe_onehot_dirs": True,
        "conv_normalization": NormalizationEnum.BATCH_NORM,
        "linear_normalization": NormalizationEnum.BATCH_NORM,
        "cnn_ppi_channels": [192, 256, 384, 448],
        "cnn_ppi_strides": [1, 1, 1, 1],
        "cnn_ppi_kernel_sizes": [7, 7, 5, 3],
        "cnn_ppi_use_stdconv": False,
        "cnn_ppi_downsample_avg": True,
        "cnn_ppi_blocks_dropout": 0.3,
        "cnn_ppi_first_bottle_ratio": 0.25,
        "cnn_ppi_global_pool": GlobalPoolEnum.GEM_3_LEARNABLE,
        "cnn_ppi_global_pool_act": False,
        "cnn_ppi_global_pool_dropout": 0.0,
        "use_mlp_shared": True,
        "mlp_shared_size": 448,
        "mlp_shared_dropout": 0.0
    }
    embedding_size = 256

    backbone_model = Multimodal_CESNET_Enhanced(**architecture_params, save_psizes_hist=True)
    model = EmbeddingModel(backbone_model, embedding_size=embedding_size)
    if weights is not None:
        state_dict = weights.get_state_dict(model_dir=model_dir)
        state_dict.pop("arcface_module.W", None)
        model.load_state_dict(state_dict)
        model.eval()
    return model

When the weights parameter is not specified, the model will be initialized with random weights and the following arguments become required:

num_classes - the number of classes, which defines the output size of the last linear layer.
flowstats_input_size - the number of flow statistics features and, therefore, the input size of the first linear layer processing them.
ppi_input_channels - the number of channels in PPI sequences. The standard value is three for packet sizes, directions, and inter-arrival times.

Input

Multi-modal models expect input in the format of tuple(batch_ppi, batch_flowstats). The shapes are:

batch_ppi torch.tensor (B, ppi_input_channels, 30) - batch size of B and the length of PPI sequences is required to be 30.
batch_flowstats torch.tensor (B, flowstats_input_size)

Jupyter notebooks listed on the getting started page show how to feed data into multi-modal models.

models.mm_cesnet_v2

mm_cesnet_v2(
    weights=None,
    model_dir=None,
    num_classes=None,
    flowstats_input_size=None,
    ppi_input_channels=None,
)

This is a second version of the multimodal CESNET architecture. It was used in the "Encrypted traffic classification: the QUIC case" paper.

Changes from the first version

Global pooling was added to the CNN part processing PPI sequences, instead of a simple flattening.
One more Conv1D layer was added to the CNN part and the number of channels was increased.
The size of the MLP processing flow statistics was increased.
The size of the MLP processing shared representations was decreased.
Some dropout rates were decreased.

Parameters:

Name	Type	Description	Default
`weights`	`Optional[MM_CESNET_V2_Weights]`	If provided, the model will be initialized with these weights.	`None`
`model_dir`	`Optional[str]`	If weights are provided, this folder will be used to store the weights.	`None`
`num_classes`	`Optional[int]`	Number of classes.	`None`
`flowstats_input_size`	`Optional[int]`	Size of the flow statistics input.	`None`
`ppi_input_channels`	`Optional[int]`	Number of channels in the PPI input.	`None`

Source code in cesnet_models\models.py

def mm_cesnet_v2(weights: Optional[MM_CESNET_V2_Weights] = None,
                 model_dir: Optional[str] = None,
                 num_classes: Optional[int] = None,
                 flowstats_input_size: Optional[int] = None,
                 ppi_input_channels: Optional[int] = None,
                 ) -> Multimodal_CESNET:
    """
    This is a second version of the multimodal CESNET architecture. It was used in
    the *"Encrypted traffic classification: the QUIC case"* paper.

    Changes from the first version:
        - Global pooling was added to the CNN part processing PPI sequences, instead of a simple flattening.
        - One more Conv1D layer was added to the CNN part and the number of channels was increased.
        - The size of the MLP processing flow statistics was increased.
        - The size of the MLP processing shared representations was decreased.
        - Some dropout rates were decreased.

    Parameters:
        weights: If provided, the model will be initialized with these weights.
        model_dir: If weights are provided, this folder will be used to store the weights.
        num_classes: Number of classes.
        flowstats_input_size: Size of the flow statistics input.
        ppi_input_channels: Number of channels in the PPI input.
    """
    v2_model_configuration = {
        "conv_normalization": NormalizationEnum.BATCH_NORM,
        "linear_normalization": NormalizationEnum.BATCH_NORM,
        "cnn_ppi_num_blocks": 3,
        "cnn_ppi_channels1": 200,
        "cnn_ppi_channels2": 300,
        "cnn_ppi_channels3": 300,
        "cnn_ppi_use_pooling": True,
        "cnn_ppi_dropout_rate": 0.1,
        "mlp_flowstats_num_hidden": 2,
        "mlp_flowstats_size1": 225,
        "mlp_flowstats_size2": 225,
        "mlp_flowstats_dropout_rate": 0.1,
        "mlp_shared_num_hidden":  0,
        "mlp_shared_size": 600,
        "mlp_shared_dropout_rate": 0.2,
    }
    return _multimodal_cesnet(model_configuration=v2_model_configuration,
                              weights=weights,
                              model_dir=model_dir,
                              num_classes=num_classes,
                              flowstats_input_size=flowstats_input_size,
                              ppi_input_channels=ppi_input_channels)

models.mm_cesnet_v1

mm_cesnet_v1(
    weights=None,
    model_dir=None,
    num_classes=None,
    flowstats_input_size=None,
    ppi_input_channels=None,
)

This model was used in the "Fine-grained TLS services classification with reject option" paper.

Parameters:

Name	Type	Description	Default
`weights`	`Optional[MM_CESNET_V1_Weights]`	If provided, the model will be initialized with these weights.	`None`
`model_dir`	`Optional[str]`	If weights are provided, this folder will be used to store the weights.	`None`
`num_classes`	`Optional[int]`	Number of classes.	`None`
`flowstats_input_size`	`Optional[int]`	Size of the flow statistics input.	`None`
`ppi_input_channels`	`Optional[int]`	Number of channels in the PPI input.	`None`

Source code in cesnet_models\models.py

def mm_cesnet_v1(weights: Optional[MM_CESNET_V1_Weights] = None,
                 model_dir: Optional[str] = None,
                 num_classes: Optional[int] = None,
                 flowstats_input_size: Optional[int] = None,
                 ppi_input_channels: Optional[int] = None,
                 ) -> Multimodal_CESNET:
    """
    This model was used in the *"Fine-grained TLS services classification with reject option"* paper.

    Parameters:
        weights: If provided, the model will be initialized with these weights.
        model_dir: If weights are provided, this folder will be used to store the weights.
        num_classes: Number of classes.
        flowstats_input_size: Size of the flow statistics input.
        ppi_input_channels: Number of channels in the PPI input.
    """
    v1_model_configuration = {
        "conv_normalization": NormalizationEnum.BATCH_NORM,
        "linear_normalization": NormalizationEnum.BATCH_NORM,
        "cnn_ppi_num_blocks": 2,
        "cnn_ppi_channels1": 72,
        "cnn_ppi_channels2": 128,
        "cnn_ppi_channels3": 128,
        "cnn_ppi_use_pooling": False,
        "cnn_ppi_dropout_rate": 0.2,
        "mlp_flowstats_num_hidden": 2,
        "mlp_flowstats_size1": 64,
        "mlp_flowstats_size2": 32,
        "mlp_flowstats_dropout_rate": 0.2,
        "mlp_shared_num_hidden": 1,
        "mlp_shared_size": 480,
        "mlp_shared_dropout_rate": 0.2,
    }
    return _multimodal_cesnet(model_configuration=v1_model_configuration,
                              weights=weights,
                              model_dir=model_dir,
                              num_classes=num_classes,
                              flowstats_input_size=flowstats_input_size,
                              ppi_input_channels=ppi_input_channels)