Identifying a Training-Set Attack's Target Using Renormalized Influence Estimation

01/25/2022
by   Zayd Hammoudeh, et al.
0

Targeted training-set attacks inject malicious instances into the training set to cause a trained model to mislabel one or more specific test instances. This work proposes the task of target identification, which determines whether a specific test instance is the target of a training-set attack. This can then be combined with adversarial-instance identification to find (and remove) the attack instances, mitigating the attack with minimal impact on other predictions. Rather than focusing on a single attack method or data modality, we build on influence estimation, which quantifies each training instance's contribution to a model's prediction. We show that existing influence estimators' poor practical performance often derives from their over-reliance on instances and iterations with large losses. Our renormalized influence estimators fix this weakness; they far outperform the original ones at identifying influential groups of training examples in both adversarial and non-adversarial settings, even finding up to 100 instances with no clean-data false positives. Target identification then simplifies to detecting test instances with anomalous influence values. We demonstrate our method's generality on backdoor and poisoning attacks across various data domains including text, vision, and speech. Our source code is available at https://github.com/ZaydH/target_identification .

READ FULL TEXT

page 5

page 24

research
04/03/2018

Poison Frogs! Targeted Clean-Label Poisoning Attacks on Neural Networks

Data poisoning is a type of adversarial attack on machine learning model...
research
05/19/2023

Mitigating Backdoor Poisoning Attacks through the Lens of Spurious Correlation

Modern NLP models are often trained over large untrusted datasets, raisi...
research
11/29/2021

A General Framework for Defending Against Backdoor Attacks via Influence Graph

In this work, we propose a new and general framework to defend against b...
research
02/17/2022

Data-SUITE: Data-centric identification of in-distribution incongruous examples

Systematic quantification of data quality is critical for consistent mod...
research
07/12/2021

Putting words into the system's mouth: A targeted attack on neural machine translation using monolingual data poisoning

Neural machine translation systems are known to be vulnerable to adversa...
research
04/30/2022

Adapting and Evaluating Influence-Estimation Methods for Gradient-Boosted Decision Trees

Influence estimation analyzes how changes to the training data can lead ...
research
11/08/2021

Revisiting Methods for Finding Influential Examples

Several instance-based explainability methods for finding influential tr...

Please sign up or login with your details

Forgot password? Click here to reset