RECAST: Enabling User Recourse and Interpretability of Toxicity Detection Models with Interactive Visualization

by   Austin P. Wright, et al.

With the widespread use of toxic language online, platforms are increasingly using automated systems that leverage advances in natural language processing to automatically flag and remove toxic comments. However, most automated systems – when detecting and moderating toxic language – do not provide feedback to their users, let alone provide an avenue of recourse for these users to make actionable changes. We present our work, RECAST, an interactive, open-sourced web tool for visualizing these models' toxic predictions, while providing alternative suggestions for flagged toxic language. Our work also provides users with a new path of recourse when using these automated moderation tools. RECAST highlights text responsible for classifying toxicity, and allows users to interactively substitute potentially toxic phrases with neutral alternatives. We examined the effect of RECAST via two large-scale user evaluations, and found that RECAST was highly effective at helping users reduce toxicity as detected through the model. Users also gained a stronger understanding of the underlying toxicity criterion used by black-box models, enabling transparency and recourse. In addition, we found that when users focus on optimizing language for these models instead of their own judgement (which is the implied incentive and goal of deploying automated models), these models cease to be effective classifiers of toxicity compared to human annotations. This opens a discussion for how toxicity detection models work and should work, and their effect on the future of online discourse.


RECAST: Interactive Auditing of Automatic Toxicity Detection Models

As toxic language becomes nearly pervasive online, there has been increa...

No Time Like the Present: Effects of Language Change on Automated Comment Moderation

The spread of online hate has become a significant problem for newspaper...

ToxVis: Enabling Interpretability of Implicit vs. Explicit Toxicity Detection Models with Interactive Visualization

The rise of hate speech on online platforms has led to an urgent need fo...

ChatGPT and Other Large Language Models as Evolutionary Engines for Online Interactive Collaborative Game Design

Large language models (LLMs) have taken the scientific world by storm, c...

Image-to-Text Translation for Interactive Image Recognition: A Comparative User Study with Non-Expert Users

Interactive machine learning (IML) allows users to build their custom ma...

SummaryLens – A Smartphone App for Exploring Interactive Use of Automated Text Summarization in Everyday Life

We present SummaryLens, a concept and prototype for a mobile tool that l...

Open Sesame! Universal Black Box Jailbreaking of Large Language Models

Large language models (LLMs), designed to provide helpful and safe respo...

Please sign up or login with your details

Forgot password? Click here to reset