Fraud Detection Using Optimized Machine Learning Tools Under Imbalance Classes

09/04/2022
by   Mary Isangediok, et al.
0

Fraud detection is a challenging task due to the changing nature of fraud patterns over time and the limited availability of fraud examples to learn such sophisticated patterns. Thus, fraud detection with the aid of smart versions of machine learning (ML) tools is essential to assure safety. Fraud detection is a primary ML classification task; however, the optimum performance of the corresponding ML tool relies on the usage of the best hyperparameter values. Moreover, classification under imbalanced classes is quite challenging as it causes poor performance in minority classes, which most ML classification techniques ignore. Thus, we investigate four state-of-the-art ML techniques, namely, logistic regression, decision trees, random forest, and extreme gradient boost, that are suitable for handling imbalance classes to maximize precision and simultaneously reduce false positives. First, these classifiers are trained on two original benchmark unbalanced fraud detection datasets, namely, phishing website URLs and fraudulent credit card transactions. Then, three synthetically balanced datasets are produced for each original data set by implementing the sampling frameworks, namely, RandomUnderSampler, SMOTE, and SMOTEENN. The optimum hyperparameters for all the 16 experiments are revealed using the method RandomzedSearchCV. The validity of the 16 approaches in the context of fraud detection is compared using two benchmark performance metrics, namely, area under the curve of receiver operating characteristics (AUC ROC) and area under the curve of precision and recall (AUC PR). For both phishing website URLs and credit card fraud transaction datasets, the results indicate that extreme gradient boost trained on the original data shows trustworthy performance in the imbalanced dataset and manages to outperform the other three methods in terms of both AUC ROC and AUC PR.

READ FULL TEXT
research
04/24/2019

A Comparison Study of Credit Card Fraud Detection: Supervised versus Unsupervised

Credit card has become popular mode of payment for both online and offli...
research
03/11/2023

Credit Card Fraud Detection Using Enhanced Random Forest Classifier for Imbalanced Data

The credit card has become the most popular payment method for both onli...
research
08/25/2022

Credit card fraud detection - Classifier selection strategy

Machine learning has opened up new tools for financial fraud detection. ...
research
11/15/2020

Precision-Recall Curve (PRC) Classification Trees

The classification of imbalanced data has presented a significant challe...
research
08/25/2022

Empirical study of Machine Learning Classifier Evaluation Metrics behavior in Massively Imbalanced and Noisy data

With growing credit card transaction volumes, the fraud percentages are ...
research
09/09/2022

Shapley value-based approaches to explain the robustness of classifiers in machine learning

In machine learning, the use of algorithm-agnostic approaches is an emer...
research
07/22/2021

Benchmarking AutoML Frameworks for Disease Prediction Using Medical Claims

We ascertain and compare the performances of AutoML tools on large, high...

Please sign up or login with your details

Forgot password? Click here to reset