NtMalDetect: A Machine Learning Approach to Malware Detection Using Native API System Calls
As computing systems become increasingly advanced and as users increasingly engage themselves in technology, security has never been a greater concern. In malware detection, static analysis has been the prominent approach. This approach, however, quickly falls short as malicious programs become more advanced and adopt the capabilities of obfuscating its binaries to execute the same malicious functions, making static analysis virtually inapplicable to newer variants. The approach assessed in this paper uses dynamic analysis of malware which may generalize better than static analysis to variants. Widely used document classification techniques were assessed in detecting malware by doing such analysis on system call traces, a form of dynamic analysis. Features considered are extracted from system call traces of benign and malicious programs, and the task to classify these traces is treated as a binary document classification task using sparse features. The system call traces were processed to remove the parameters to only leave the system call function names. The features were grouped into various n-grams and weighted with Term Frequency-Inverse Document Frequency. Support Vector Machines were used and optimized using a Stochastic Gradient Descent algorithm that implemented L1, L2, and Elastic-Net regularization terms, the best of which achieved a highest of 98 identification of significant system call sequences that could be avenues for further research.
READ FULL TEXT