Comparing Performance of Different Linguistically-Backed Word Embeddings for Cyberbullying Detection

06/04/2022
by   Juuso Eronen, et al.
0

In most cases, word embeddings are learned only from raw tokens or in some cases, lemmas. This includes pre-trained language models like BERT. To investigate on the potential of capturing deeper relations between lexical items and structures and to filter out redundant information, we propose to preserve the morphological, syntactic and other types of linguistic information by combining them with the raw tokens or lemmas. This means, for example, including parts-of-speech or dependency information within the used lexical features. The word embeddings can then be trained on the combinations instead of just raw tokens. It is also possible to later apply this method to the pre-training of huge language models and possibly enhance their performance. This would aid in tackling problems which are more sophisticated from the point of view of linguistic representation, such as detection of cyberbullying.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/23/2020

GiBERT: Introducing Linguistic Knowledge into BERT through a Lightweight Gated Injection Method

Large pre-trained language models such as BERT have been the driving for...
research
04/07/2021

Combining Pre-trained Word Embeddings and Linguistic Features for Sequential Metaphor Identification

We tackle the problem of identifying metaphors in text, treated as a seq...
research
09/05/2015

Take and Took, Gaggle and Goose, Book and Read: Evaluating the Utility of Vector Differences for Lexical Relation Learning

Recent work on word embeddings has shown that simple vector subtraction ...
research
03/07/2020

Discovering linguistic (ir)regularities in word embeddings through max-margin separating hyperplanes

We experiment with new methods for learning how related words are positi...
research
02/29/2016

Representation of linguistic form and function in recurrent neural networks

We present novel methods for analyzing the activation patterns of RNNs f...
research
09/19/2021

Conditional probing: measuring usable information beyond a baseline

Probing experiments investigate the extent to which neural representatio...
research
09/17/2022

Unsupervised Lexical Substitution with Decontextualised Embeddings

We propose a new unsupervised method for lexical substitution using pre-...

Please sign up or login with your details

Forgot password? Click here to reset