Encoding NetFlows for State-Machine Learning

07/08/2022
by   Clinton Cao, et al.
0

NetFlow data is a well-known network log format used by many network analysts and researchers. The advantages of using this format compared to pcap are that it contains fewer data, is less privacy intrusive, and is easier to collect and process. However, having less data does mean that this format might not be able to capture important network behaviour as all information is summarised into statistics. Much research aims to overcome this disadvantage through the use of machine learning, for instance, to detect attacks within a network. Many approaches can be used to pre-process the NetFlow data before it is used to train the machine learning algorithms. However, many of these approaches simply apply existing methods to the data, not considering the specific properties of network data. We argue that for data originating from software systems, such as NetFlow or software logs, similarities in frequency and contexts of feature values are more important than similarities in the value itself. In this work, we, therefore, propose an encoding algorithm that directly takes the frequency and the context of the feature values into account when the data is being processed. Different types of network behaviours can be clustered using this encoding, thus aiding the process of detecting anomalies within the network. From windows of these clusters obtained from monitoring a clean system, we learn state machine behavioural models for anomaly detection. These models are very well-suited to modelling the cyclic and repetitive patterns present in NetFlow data. We evaluate our encoding on a new dataset that we created for detecting problems in Kubernetes clusters and on two well-known public NetFlow datasets. The obtained performance results of the state machine models are comparable to existing works that use many more features and require both clean and infected data as training input.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/04/2021

Log-based Anomaly Detection Without Log Parsing

Software systems often record important runtime information in system lo...
research
11/26/2021

A Taxonomy of Anomalies in Log Data

Log data anomaly detection is a core component in the area of artificial...
research
03/21/2022

FGAN: Federated Generative Adversarial Networks for Anomaly Detection in Network Traffic

Over the last two decades, a lot of work has been done in improving netw...
research
01/31/2022

StRegA: Unsupervised Anomaly Detection in Brain MRIs using a Compact Context-encoding Variational Autoencoder

Expert interpretation of anatomical images of the human brain is the cen...
research
03/29/2022

syslrn: Learning What to Monitor for Efficient Anomaly Detection

While monitoring system behavior to detect anomalies and failures is imp...
research
02/14/2023

Heterogeneous Anomaly Detection for Software Systems via Semi-supervised Cross-modal Attention

Prompt and accurate detection of system anomalies is essential to ensure...
research
11/29/2022

Novelty Detection for Election Fraud: A Case Study with Agent-Based Simulation Data

In this paper, we propose a robust election simulation model and indepen...

Please sign up or login with your details

Forgot password? Click here to reset