An Enhanced Machine Learning Topic Classification Methodology for Cybersecurity

08/30/2021
by   Elijah Pelofske, et al.
0

In this research, we use user defined labels from three internet text sources (Reddit, Stackexchange, Arxiv) to train 21 different machine learning models for the topic classification task of detecting cybersecurity discussions in natural text. We analyze the false positive and false negative rates of each of the 21 model's in a cross validation experiment. Then we present a Cybersecurity Topic Classification (CTC) tool, which takes the majority vote of the 21 trained machine learning models as the decision mechanism for detecting cybersecurity related text. We also show that the majority vote mechanism of the CTC tool provides lower false negative and false positive rates on average than any of the 21 individual models. We show that the CTC tool is scalable to the hundreds of thousands of documents with a wall clock time on the order of hours.

READ FULL TEXT
research
09/03/2018

Adversarial Attack Type I: Generating False Positives

False positive and false negative rates are equally important for evalua...
research
05/25/2020

Demoting Racial Bias in Hate Speech Detection

In current hate speech datasets, there exists a high correlation between...
research
11/21/2022

Rooms with Text: A Dataset for Overlaying Text Detection

In this paper, we introduce a new dataset of room interior pictures with...
research
12/06/2021

Automation Of Transiting Exoplanet Detection, Identification and Habitability Assessment Using Machine Learning Approaches

We are at a unique timeline in the history of human evolution where we m...
research
07/11/2023

Merging multiple input descriptors and supervisors in a deep neural network for tractogram filtering

One of the main issues of the current tractography methods is their high...
research
12/24/2020

Multi-modal Identification of State-Sponsored Propaganda on Social Media

The prevalence of state-sponsored propaganda on the Internet has become ...
research
10/02/2018

PromID: human promoter prediction by deep learning

Computational identification of promoters is notoriously difficult as hu...

Please sign up or login with your details

Forgot password? Click here to reset