When silver glitters more than gold: Bootstrapping an Italian part-of-speech tagger for Twitter

11/09/2016
by   Barbara Plank, et al.
0

We bootstrap a state-of-the-art part-of-speech tagger to tag Italian Twitter data, in the context of the Evalita 2016 PoSTWITA shared task. We show that training the tagger on native Twitter data enriched with little amounts of specifically selected gold data and additional silver-labelled data scraped from Facebook, yields better results than using large amounts of manually annotated data from a mix of genres.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/17/2017

To Normalize, or Not to Normalize: The Impact of Normalization on Part-of-Speech Tagging

Does normalization help Part-of-Speech (POS) tagging accuracy on noisy, ...
research
05/17/2016

Tweet Acts: A Speech Act Classifier for Twitter

Speech acts are a way to conceptualize speech as action. This holds true...
research
12/16/2020

You Are What You Tweet: Profiling Users by Past Tweets to Improve Hate Speech Detection

Hate speech detection research has predominantly focused on purely conte...
research
04/09/2015

Leveraging Twitter for Low-Resource Conversational Speech Language Modeling

In applications involving conversational speech, data sparsity is a limi...
research
08/30/2018

Comparative Studies of Detecting Abusive Language on Twitter

The context-dependent nature of online aggression makes annotating large...
research
02/24/2017

Studying Positive Speech on Twitter

We present results of empirical studies on positive speech on Twitter. B...
research
10/25/2019

Exploring Author Context for Detecting Intended vs Perceived Sarcasm

We investigate the impact of using author context on textual sarcasm det...

Please sign up or login with your details

Forgot password? Click here to reset