WebEye - Automated Collection of Malicious HTTP Traffic

02/16/2018
by   Johann Vierthaler, et al.
0

With malware detection techniques increasingly adopting machine learning approaches, the creation of precise training sets becomes more and more important. Large data sets of realistic web traffic, correctly classified as benign or malicious are needed, not only to train classic and deep learning algorithms, but also to serve as evaluation benchmarks for existing malware detection products. Interestingly, despite the vast number and versatility of threats a user may encounter when browsing the web, actual malicious content is often hard to come by, since prerequisites such as browser and operating system type and version must be met in order to receive the payload from a malware distributing server. In combination with privacy constraints on data sets of actual user traffic, it is difficult for researchers and product developers to evaluate anti-malware solutions against large-scale data sets of realistic web traffic. In this paper we present WebEye, a framework that autonomously creates realistic HTTP traffic, enriches recorded traffic with additional information, and classifies records as malicious or benign, using different classifiers. We are using WebEye to collect malicious HTML and JavaScript and show how datasets created with WebEye can be used to train machine learning based malware detection algorithms. We regard WebEye and the data sets it creates as a tool for researchers and product developers to evaluate and improve their AI-based anti-malware solutions against large-scale benchmarks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/21/2019

Joint Detection of Malicious Domains and Infected Clients

Detection of malware-infected computers and detection of malicious web d...
research
05/29/2018

Limitless HTTP in an HTTPS World: Inferring the Semantics of the HTTPS Protocol without Decryption

We present new analytic techniques for inferring HTTP semantics from pas...
research
04/20/2021

On Generating and Labeling Network Traffic with Realistic, Self-Propagating Malware

Research and development of techniques which detect or remediate malicio...
research
09/07/2023

Detecting unknown HTTP-based malicious communication behavior via generated adversarial flows and hierarchical traffic features

Malicious communication behavior is the network communication behavior g...
research
04/29/2022

Symbolic analysis meets federated learning to enhance malware identifier

Over past years, the manually methods to create detection rules were no ...
research
06/01/2021

MalPhase: Fine-Grained Malware Detection Using Network Flow Data

Economic incentives encourage malware authors to constantly develop new,...
research
09/23/2020

Dataset Optimization Strategies for MalwareTraffic Detection

Machine learning is rapidly becoming one of the most important technolog...

Please sign up or login with your details

Forgot password? Click here to reset