Editing a classifier by rewriting its prediction rules

12/02/2021
by   Shibani Santurkar, et al.
13

We present a methodology for modifying the behavior of a classifier by directly rewriting its prediction rules. Our approach requires virtually no additional data collection and can be applied to a variety of settings, including adapting a model to new environments, and modifying it to ignore spurious features. Our code is available at https://github.com/MadryLab/EditingClassifiers .

READ FULL TEXT

page 2

page 4

page 10

page 22

page 24

page 26

page 28

page 30

research
04/26/2021

Yes, BM25 is a Strong Baseline for Legal Case Retrieval

We describe our single submission to task 1 of COLIEE 2021. Our vanilla ...
research
09/19/2017

A Fast and Accurate Vietnamese Word Segmenter

We propose a novel approach to Vietnamese word segmentation. Our approac...
research
12/05/2022

Exploring Stroke-Level Modifications for Scene Text Editing

Scene text editing (STE) aims to replace text with the desired one while...
research
12/26/2018

Deconfounding Reinforcement Learning in Observational Settings

We propose a general formulation for addressing reinforcement learning (...
research
11/30/2021

ePose: Let's Make EfficientPose More Generally Applicable

EfficientPose is an impressive 3D object detection model. It has been de...
research
06/15/2023

Explore, Establish, Exploit: Red Teaming Language Models from Scratch

Deploying Large language models (LLMs) can pose hazards from harmful out...
research
06/15/2021

Learning Stable Classifiers by Transferring Unstable Features

We study transfer learning in the presence of spurious correlations. We ...

Please sign up or login with your details

Forgot password? Click here to reset