A Comparison of Word2Vec, HMM2Vec, and PCA2Vec for Malware Classification

03/07/2021
by   Aniket Chandak, et al.
0

Word embeddings are often used in natural language processing as a means to quantify relationships between words. More generally, these same word embedding techniques can be used to quantify relationships between features. In this paper, we first consider multiple different word embedding techniques within the context of malware classification. We use hidden Markov models to obtain embedding vectors in an approach that we refer to as HMM2Vec, and we generate vector embeddings based on principal component analysis. We also consider the popular neural network based word embedding technique known as Word2Vec. In each case, we derive feature embeddings based on opcode sequences for malware samples from a variety of different families. We show that we can obtain better classification accuracy based on these feature embeddings, as compared to HMM experiments that directly use the opcode sequences, and serve to establish a baseline. These results show that word embeddings can be a useful feature engineering step in the field of malware analysis.

READ FULL TEXT

page 22

page 23

research
03/03/2021

Malware Classification with Word Embedding Features

Malware classification is an important and challenging problem in inform...
research
07/09/2020

Principal Word Vectors

We generalize principal component analysis for embedding words into a ve...
research
03/07/2021

Word Embedding Techniques for Malware Evolution Detection

Malware detection is a critical aspect of information security. One diff...
research
08/18/2019

Scene Classification in Indoor Environments for Robots using Context Based Word Embeddings

Scene Classification has been addressed with numerous techniques in comp...
research
03/24/2016

Part-of-Speech Relevance Weights for Learning Word Embeddings

This paper proposes a model to learn word embeddings with weighted conte...
research
03/03/2021

Malware Classification with GMM-HMM Models

Discrete hidden Markov models (HMM) are often applied to malware detecti...
research
04/19/2017

Predicting Role Relevance with Minimal Domain Expertise in a Financial Domain

Word embeddings have made enormous inroads in recent years in a wide var...

Please sign up or login with your details

Forgot password? Click here to reset