PhishZip: A New Compression-based Algorithm for Detecting Phishing Websites

07/22/2020
by   Rizka Purwanto, et al.
0

Phishing has grown significantly in the past few years and is predicted to further increase in the future. The dynamics of phishing introduce challenges in implementing a robust phishing detection system and selecting features which can represent phishing despite the change of attack. In this paper, we propose PhishZip which is a novel phishing detection approach using a compression algorithm to perform website classification and demonstrate a systematic way to construct the word dictionaries for the compression models using word occurrence likelihood analysis. PhishZip outperforms the use of best-performing HTML-based features in past studies, with a true positive rate of 80.04 also propose the use of compression ratio as a novel machine learning feature which significantly improves machine learning based phishing detection over previous studies. Using compression ratios as additional features, the true positive rate significantly improves by 30.3 the accuracy increases by 11.84

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/13/2022

PhishSim: Aiding Phishing Website Detection with a Feature-Free Tool

In this paper, we propose a feature-free method for detecting phishing w...
research
11/13/2019

Enumerative Data Compression with Non-Uniquely Decodable Codes

Non-uniquely decodable codes can be defined as the codes that cannot be ...
research
08/08/2023

Lossy and Lossless (L^2) Post-training Model Size Compression

Deep neural networks have delivered remarkable performance and have been...
research
12/05/2018

Unlabeled sample compression schemes and corner peelings for ample and maximum classes

We examine connections between combinatorial notions that arise in machi...
research
09/30/2022

Wake Word Detection Based on Res2Net

This letter proposes a new wake word detection system based on Res2Net. ...
research
11/18/2021

Improving Prediction-Based Lossy Compression Dramatically Via Ratio-Quality Modeling

Error-bounded lossy compression is one of the most effective techniques ...
research
11/20/2018

Artificial Color Constancy via GoogLeNet with Angular Loss Function

Color Constancy is the ability of the human visual system to perceive co...

Please sign up or login with your details

Forgot password? Click here to reset