RECAST: Interactive Auditing of Automatic Toxicity Detection Models

by   Austin P. Wright, et al.
Georgia Institute of Technology

As toxic language becomes nearly pervasive online, there has been increasing interest in leveraging the advancements in natural language processing (NLP), from very large transformer models to automatically detecting and removing toxic comments. Despite the fairness concerns, lack of adversarial robustness, and limited prediction explainability for deep learning systems, there is currently little work for auditing these systems and understanding how they work for both developers and users. We present our ongoing work, RECAST, an interactive tool for examining toxicity detection models by visualizing explanations for predictions and providing alternative wordings for detected toxic speech.


RECAST: Enabling User Recourse and Interpretability of Toxicity Detection Models with Interactive Visualization

With the widespread use of toxic language online, platforms are increasi...

A Survey of the State of Explainable AI for Natural Language Processing

Recent years have seen important advances in the quality of state-of-the...

Explaining Hate Speech Classification with Model Agnostic Methods

There have been remarkable breakthroughs in Machine Learning and Artific...

ToxVis: Enabling Interpretability of Implicit vs. Explicit Toxicity Detection Models with Interactive Visualization

The rise of hate speech on online platforms has led to an urgent need fo...

Rethinking Explainability as a Dialogue: A Practitioner's Perspective

As practitioners increasingly deploy machine learning models in critical...

XNLP: An Interactive Demonstration System for Universal Structured NLP

Structured Natural Language Processing (XNLP) is an important subset of ...

Please sign up or login with your details

Forgot password? Click here to reset