Civil Rephrases Of Toxic Texts With Self-Supervised Transformers

02/01/2021
by   Leo Laugier, et al.
0

Platforms that support online commentary, from social networks to news sites, are increasingly leveraging machine learning to assist their moderation efforts. But this process does not typically provide feedback to the author that would help them contribute according to the community guidelines. This is prohibitively time-consuming for human moderators to do, and computational approaches are still nascent. This work focuses on models that can help suggest rephrasings of toxic comments in a more civil manner. Inspired by recent progress in unpaired sequence-to-sequence tasks, a self-supervised learning model is introduced, called CAE-T5. CAE-T5 employs a pre-trained text-to-text transformer, which is fine tuned with a denoising and cyclic auto-encoder loss. Experimenting with the largest toxicity detection dataset to date (Civil Comments) our model generates sentences that are more fluent and better at preserving the initial content compared to earlier text style transfer systems which we compare with using several scoring systems and human evaluation.

READ FULL TEXT
research
02/03/2021

Studying the Usage of Text-To-Text Transfer Transformer to Support Code-Related Tasks

Deep learning (DL) techniques are gaining more and more attention in the...
research
05/15/2018

Simplifying Sentences with Sequence to Sequence Models

We simplify sentences with an attentive neural network sequence to seque...
research
05/18/2022

Exploiting Social Media Content for Self-Supervised Style Transfer

Recent research on style transfer takes inspiration from unsupervised ne...
research
02/15/2021

MAPGN: MAsked Pointer-Generator Network for sequence-to-sequence pre-training

This paper presents a self-supervised learning method for pointer-genera...
research
11/01/2020

Towards A Friendly Online Community: An Unsupervised Style Transfer Framework for Profanity Redaction

Offensive and abusive language is a pressing problem on social media pla...
research
05/17/2022

HelixADMET: a robust and endpoint extensible ADMET system incorporating self-supervised knowledge transfer

Accurate ADMET (an abbreviation for "absorption, distribution, metabolis...
research
12/05/2019

Self-Supervised Contextual Language Representation of Radiology Reports to Improve the Identification of Communication Urgency

Machine learning methods have recently achieved high-performance in biom...

Please sign up or login with your details

Forgot password? Click here to reset