Towards Accurate Labeling of Android Apps for Reliable Malware Detection

07/01/2020
by   Aleieldin Salem, et al.
0

In training their newly-developed malware detection methods, researchers rely on threshold-based labeling strategies that interpret the scan reports provided by online platforms, such as VirusTotal. The dynamicity of this platform renders those labeling strategies unsustainable over prolonged periods, which leads to inaccurate labels. Using inaccurately labeled apps to train and evaluate malware detection methods significantly undermines the reliability of their results, leading to either dismissing otherwise promising detection approaches or adopting intrinsically inadequate ones. The infeasibility of generating accurate labels via manual analysis and the lack of reliable alternatives force researchers to utilize VirusTotal to label apps. In the paper, we tackle this issue in two manners. Firstly, we reveal the aspects of VirusTotal's dynamicity and how they impact threshold-based labeling strategies and provide actionable insights on how to use these labeling strategies given VirusTotal's dynamicity reliably. Secondly, we motivate the implementation of alternative platforms by (a) identifying VirusTotal limitations that such platforms should avoid, and (b) proposing an architecture of how such platforms can be constructed to mitigate VirusTotal's limitations.

READ FULL TEXT
research
07/01/2020

Maat: Automatically Analyzing VirusTotal for Accurate Labeling and Effective Malware Detection

The malware analysis and detection research community relies on the onli...
research
03/25/2019

Don't Pick the Cherry: An Evaluation Methodology for Android Malware Detection Methods

In evaluating detection methods, the malware research community relies o...
research
12/05/2021

On Impact of Semantically Similar Apps in Android Malware Datasets

Malware authors reuse the same program segments found in other applicati...
research
01/17/2023

Simplistic Collection and Labeling Practices Limit the Utility of Benchmark Datasets for Twitter Bot Detection

Accurate bot detection is necessary for the safety and integrity of onli...
research
08/03/2018

Stimulation and Detection of Android Repackaged Malware with Active Learning

Repackaging is a technique that has been increasingly adopted by authors...
research
08/13/2019

Similarity-based Android Malware Detection Using Hamming Distance of Static Binary Features

In this paper, we develop four malware detection methods using Hamming d...

Please sign up or login with your details

Forgot password? Click here to reset