Sex, drugs, and violence

08/11/2016
by   Stefania Raimondo, et al.
0

Automatically detecting inappropriate content can be a difficult NLP task, requiring understanding context and innuendo, not just identifying specific keywords. Due to the large quantity of online user-generated content, automatic detection is becoming increasingly necessary. We take a largely unsupervised approach using a large corpus of narratives from a community-based self-publishing website and a small segment of crowd-sourced annotations. We explore topic modelling using latent Dirichlet allocation (and a variation), and use these to regress appropriateness ratings, effectively automating rating for suitability. The results suggest that certain topics inferred may be useful in detecting latent inappropriateness -- yielding recall up to 96 regression errors.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/15/2016

Topic Modeling Using Distributed Word Embeddings

We propose a new algorithm for topic modeling, Vec2Topic, that identifie...
research
01/08/2023

Topic Modelling of Swedish Newspaper Articles about Coronavirus: a Case Study using Latent Dirichlet Allocation Method

Topic Modelling (TM) is from the research branches of natural language u...
research
10/14/2020

On Cross-Dataset Generalization in Automatic Detection of Online Abuse

NLP research has attained high performances in abusive language detectio...
research
12/23/2022

Content Rating Classification for Fan Fiction

Content ratings can enable audiences to determine the suitability of var...
research
03/28/2016

Longitudinal Analysis of Discussion Topics in an Online Breast Cancer Community using Convolutional Neural Networks

Identifying topics of discussions in online health communities (OHC) is ...
research
06/30/2018

A Constrained Coupled Matrix-Tensor Factorization for Learning Time-evolving and Emerging Topics

Topic discovery has witnessed a significant growth as a field of data mi...

Please sign up or login with your details

Forgot password? Click here to reset