Transformer-Boosted Anomaly Detection with Fuzzy Hashes

08/24/2022
by   Frieder Uhlig, et al.
12

Fuzzy hashes are an important tool in digital forensics and are used in approximate matching to determine the similarity between digital artifacts. They translate the byte code of files into computable strings, which makes them particularly interesting for intelligent machine processing. In this work, we propose deep learning approximate matching (DLAM), which achieves much higher accuracy in detecting anomalies in fuzzy hashes than conventional approaches. In addition to the well-known application for clustering malware, we show that fuzzy hashes and deep learning are indeed well-suited to classify files according to the presence of certain content, e.g., malware. DLAM relies on transformer-based models from the field of natural language processing and outperforms existing methods. Traditional fuzzy hashes like TLSH and ssdeep have a limited size and fail to detect file anomalies if they are relatively small compared to the overall file size. DLAM, however, enables the detection of such file correlations in the computed fuzzy hashes of TLSH and ssdeep, even for anomaly sizes of less than 15 state-of-the-art fuzzy hashing algorithms while relying on more efficient hash computations and can, therefore, be used at a much larger scale.

READ FULL TEXT

page 5

page 6

research
12/17/2018

Fuzzy Hashing as Perturbation-Consistent Adversarial Kernel Embedding

Measuring the similarity of two files is an important task in malware an...
research
04/14/2020

Topology-Aware Hashing for Effective Control Flow Graph Similarity Analysis

Control Flow Graph (CFG) similarity analysis is an essential technique f...
research
11/27/2021

Assessing the Effectiveness of YARA Rules for Signature-Based Malware Detection and Classification

Malware often uses obfuscation techniques or is modified slightly to eva...
research
10/06/2021

Stegomalware: A Systematic Survey of MalwareHiding and Detection in Images, Machine LearningModels and Research Challenges

Malware distribution to the victim network is commonly performed through...
research
09/12/2018

Using Intuitionistic Fuzzy Set for Anomaly Detection of Network Traffic from Flow Interaction

We present a method to detect anomalies in a time series of flow interac...
research
12/05/2019

Deep Anomaly Detection in Packet Payload

With the widespread adoption of cloud services, especially the extensive...
research
02/02/2022

How to Improve Deep Learning for Software Analytics (a case study with code smell detection)

To reduce technical debt and make code more maintainable, it is importan...

Please sign up or login with your details

Forgot password? Click here to reset