Generating Labeled Flow Data from MAWILab Traces for Network Intrusion Detection

by   Jinoh Kim, et al.

A growing issue in the modern cyberspace world is the direct identification of malicious activity over network connections. The boom of the machine learning industry in the past few years has led to the increasing usage of machine learning technologies, which are especially prevalent in the network intrusion detection research community. When utilizing these fairly contemporary techniques, the community has realized that datasets are pivotal for identifying malicious packets and connections, particularly ones associated with information concerning labeling in order to construct learning models. However, there exists a shortage of publicly available, relevant datasets to researchers in the network intrusion detection community. Thus, in this paper, we introduce a method to construct labeled flow data by combining the packet meta-information with IDS logs to infer labels for intrusion detection research. Specifically, we designed a NetFlow-compatible format due to the capability of a a large body of network devices, such as routers and switches, to export NetFlow records from raw traffic. In doing so, the introduced method at hand would aid researchers to access relevant network flow datasets along with label information.


page 1

page 2

page 3

page 4


Data Curation and Quality Assurance for Machine Learning-based Cyber Intrusion Detection

Intrusion detection is an essential task in the cyber threat environment...

A flow-based IDS using Machine Learning in eBPF

eBPF is a new technology which allows dynamically loading pieces of code...

"Flow Size Difference" Can Make a Difference: Detecting Malicious TCP Network Flows Based on Benford's Law

Statistical characteristics of network traffic have attracted a signific...

A new method for flow-based network intrusion detection using inverse statistical physics

Network Intrusion Detection Systems (NIDS) play an important role as too...

Datasets are not Enough: Challenges in Labeling Network Traffic

In contrast to previous surveys, the present work is not focused on revi...

Graph-based Solutions with Residuals for Intrusion Detection: the Modified E-GraphSAGE and E-ResGAT Algorithms

The high volume of increasingly sophisticated cyber threats is drawing g...

A Public Network Trace of a Control and Automation System

The increasing number of attacks against automation systems such as SCAD...

Please sign up or login with your details

Forgot password? Click here to reset