Facilitating Fine-grained Detection of Chinese Toxic Language: Hierarchical Taxonomy, Resources, and Benchmarks

05/08/2023
by   Junyu Lu, et al.
0

The widespread dissemination of toxic online posts is increasingly damaging to society. However, research on detecting toxic language in Chinese has lagged significantly. Existing datasets lack fine-grained annotation of toxic types and expressions, and ignore the samples with indirect toxicity. In addition, it is crucial to introduce lexical knowledge to detect the toxicity of posts, which has been a challenge for researchers. In this paper, we facilitate the fine-grained detection of Chinese toxic language. First, we built Monitor Toxic Frame, a hierarchical taxonomy to analyze toxic types and expressions. Then, a fine-grained dataset ToxiCN is presented, including both direct and indirect toxic samples. We also build an insult lexicon containing implicit profanity and propose Toxic Knowledge Enhancement (TKE) as a benchmark, incorporating the lexical feature to detect toxic language. In the experimental stage, we demonstrate the effectiveness of TKE. After that, a systematic quantitative and qualitative analysis of the findings is given.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/19/2020

A Chinese Corpus for Fine-grained Entity Typing

Fine-grained entity typing is a challenging task with wide applications....
research
09/11/2021

Latent Hatred: A Benchmark for Understanding Implicit Hate Speech

Hate speech has grown significantly on social media, causing serious con...
research
01/31/2022

Holistic Fine-grained GGS Characterization: From Detection to Unbalanced Classification

Recent studies have demonstrated the diagnostic and prognostic values of...
research
08/27/2020

AMBERT: A Pre-trained Language Model with Multi-Grained Tokenization

Pre-trained language models such as BERT have exhibited remarkable perfo...
research
01/19/2022

Many Ways to be Lonely: Fine-grained Characterization of Loneliness and its Potential Changes in COVID-19

Loneliness has been associated with negative outcomes for physical and m...
research
10/06/2016

Toward Automatic Understanding of the Function of Affective Language in Support Groups

Understanding expressions of emotions in support forums has considerable...
research
11/18/2019

Fine-Grained Static Detection of Obfuscation Transforms Using Ensemble-Learning and Semantic Reasoning

The ability to efficiently detect the software protections used is at a ...

Please sign up or login with your details

Forgot password? Click here to reset