Using Lexical Features for Malicious URL Detection – A Machine Learning Approach

10/14/2019
by   Apoorva Joshi, et al.
0

Malicious websites are responsible for a majority of the cyber-attacks and scams today. Malicious URLs are delivered to unsuspecting users via email, text messages, pop-ups or advertisements. Clicking on or crawling such URLs can result in compromised email accounts, launching of phishing campaigns, download of malware, spyware and ransomware, as well as severe monetary losses. A machine learning based ensemble classification approach is proposed to detect malicious URLs in emails, which can be extended to other methods of delivery of malicious URLs. The approach uses static lexical features extracted from the URL string, with the assumption that these features are notably different for malicious and benign URLs. The use of such static features is safer and faster since it does not involve crawling the URLs or blacklist lookups which tend to introduce a significant amount of latency in producing verdicts. The goal of the classification was to achieve high sensitivity i.e. detect as many malicious URLs as possible. URL strings tend to be very unstructured and noisy. Hence, bagging algorithms were found to be a good fit for the task since they average out multiple learners trained on different parts of the training data, thus reducing variance. The classification model was tested on five different testing sets and produced an average False Negative Rate (FNR) of 0.1 accuracy of 92 the FireEye Advanced URL Detection Engine (used to detect malicious URLs in emails), to generate fast real-time verdicts on URLs. The malicious URL detections from the engine have gone up by 22 model into the engine workflow. The results obtained show noteworthy evidence that a purely lexical approach can be used to detect malicious URLs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/13/2022

Detection of Malicious Websites Using Machine Learning Techniques

In detecting malicious websites, a common approach is the use of blackli...
research
02/09/2018

URLNet: Learning a URL Representation with Deep Learning for Malicious URL Detection

Malicious URLs host unsolicited content and are used to perpetrate cyber...
research
02/23/2018

An investigation of the classifiers to detect android malicious apps

Android devices are growing exponentially and are connected through the ...
research
04/22/2018

MEADE: Towards a Malicious Email Attachment Detection Engine

Malicious email attachments are a growing delivery vector for malware. W...
research
04/07/2018

A Machine Learning Approach To Prevent Malicious Calls Over Telephony Networks

Malicious calls, i.e., telephony spams and scams, have been a long-stand...
research
05/22/2019

Deep Reinforcement Learning for Detecting Malicious Websites

Phishing is the simplest form of cybercrime with the objective of baitin...
research
07/25/2019

Semisupervised Adversarial Neural Networks for Cyber Security Transfer Learning

On the path to establishing a global cybersecurity framework where each ...

Please sign up or login with your details

Forgot password? Click here to reset