A Mixture Model Based Defense for Data Poisoning Attacks Against Naive Bayes Spam Filters

by   David J. Miller, et al.

Naive Bayes spam filters are highly susceptible to data poisoning attacks. Here, known spam sources/blacklisted IPs exploit the fact that their received emails will be treated as (ground truth) labeled spam examples, and used for classifier training (or re-training). The attacking source thus generates emails that will skew the spam model, potentially resulting in great degradation in classifier accuracy. Such attacks are successful mainly because of the poor representation power of the naive Bayes (NB) model, with only a single (component) density to represent spam (plus a possible attack). We propose a defense based on the use of a mixture of NB models. We demonstrate that the learned mixture almost completely isolates the attack in a second NB component, with the original spam component essentially unchanged by the attack. Our approach addresses both the scenario where the classifier is being re-trained in light of new data and, significantly, the more challenging scenario where the attack is embedded in the original spam training set. Even for weak attack strengths, BIC-based model order selection chooses a two-component solution, which invokes the mixture-based defense. Promising results are presented on the TREC 2005 spam corpus.


page 1

page 2

page 3

page 4


A BIC based Mixture Model Defense against Data Poisoning Attacks on Classifiers

Data Poisoning (DP) is an effective attack that causes trained classifie...

A Bayesian Network Classifier that Combines a Finite Mixture Model and a Naive Bayes Model

In this paper we present a new Bayesian network model for classification...

Revealing Backdoors, Post-Training, in DNN Classifiers via Novel Inference on Optimized Perturbations Inducing Group Misclassification

Recently, a special type of data poisoning (DP) attack targeting Deep Ne...

Adversarial defenses via a mixture of generators

In spite of the enormous success of neural networks, adversarial example...

Adversarial Examples for Non-Parametric Methods: Attacks, Defenses and Large Sample Limits

Adversarial examples have received a great deal of recent attention beca...

Membership Inference Attacks and Defenses in Supervised Learning via Generalization Gap

This work studies membership inference (MI) attack against classifiers, ...

Using BERT Encoding to Tackle the Mad-lib Attack in SMS Spam Detection

One of the stratagems used to deceive spam filters is to substitute voca...

Please sign up or login with your details

Forgot password? Click here to reset