A compression based framework for the detection of anomalies in heterogeneous data sources

Nowadays, information and communications technology systems are fundamental assets of our social and economical model, and thus they should be properly protected against the malicious activity of cybercriminals. Defence mechanisms are generally articulated around tools that trace and store information in several ways, the simplest one being the generation of plain text files coined as security logs. This log files are usually inspected, in a semi-automatic way, by security analysts to detect events that may affect system integrity. On this basis, we propose a parameter-free methodology to detect security incidents from structured text regardless its nature. We use the Normalized Compression Distance to obtain a set of features that can be used by a Support Vector Machine to classify events from a heterogeneous cybersecurity environment. In specific, we explore and validate the application of our methodology in four different cybersecurity domains: HTTP anomaly identification, spam detection, Domain Generation Algorithms tracking and sentiment analysis. The results obtained show the validity and flexibility of our approach in different security scenarios with a low configuration burden.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/10/2023

Extension of Dictionary-Based Compression Algorithms for the Quantitative Visualization of Patterns from Log Files

Many services today massively and continuously produce log files of diff...
research
03/31/2022

SIERRA: Ranking Anomalous Activities in Enterprise Networks

An enterprise today deploys multiple security middleboxes such as firewa...
research
09/20/2017

Text Compression for Sentiment Analysis via Evolutionary Algorithms

Can textual data be compressed intelligently without losing accuracy in ...
research
07/14/2020

ADSAGE: Anomaly Detection in Sequences of Attributed Graph Edges applied to insider threat detection at fine-grained level

Previous works on the CERT insider threat detection case have neglected ...
research
05/18/2018

Extending Dynamic Bayesian Networks for Anomaly Detection in Complex Logs

Checking various log files from different processes can be a tedious tas...
research
05/28/2022

GLITCH: an Intermediate-Representation-Based Security Analysis for Infrastructure as Code Scripts

Infrastructure as Code (IaC) is the process of managing IT infrastructur...
research
06/22/2018

The automatic detection of the information operations event basis

The methodology of automatic detection of the event basis of information...

Please sign up or login with your details

Forgot password? Click here to reset