NTULM: Enriching Social Media Text Representations with Non-Textual Units

10/29/2022
by   Jinning Li, et al.
0

On social media, additional context is often present in the form of annotations and meta-data such as the post's author, mentions, Hashtags, and hyperlinks. We refer to these annotations as Non-Textual Units (NTUs). We posit that NTUs provide social context beyond their textual semantics and leveraging these units can enrich social media text representations. In this work we construct an NTU-centric social heterogeneous network to co-embed NTUs. We then principally integrate these NTU embeddings into a large pretrained language model by fine-tuning with these additional units. This adds context to noisy short-text social media. Experiments show that utilizing NTU-augmented text representations significantly outperforms existing text-only baselines by 2-5% relative points on many downstream tasks highlighting the importance of context to social media NLP. We also highlight that including NTU context into the initial layers of language model alongside text is better than using it after the text embedding is generated. Our work leads to the generation of holistic general purpose social media content embedding.

READ FULL TEXT
research
12/03/2019

A Comparative Study of Pretrained Language Models on Thai Social Text Categorization

The ever-growing volume of data of user-generated content on social medi...
research
05/24/2023

Text Conditional Alt-Text Generation for Twitter Images

In this work we present an approach for generating alternative text (or ...
research
08/25/2018

Representing Social Media Users for Sarcasm Detection

We explore two methods for representing authors in the context of textua...
research
09/06/2023

C-CLIP: Contrastive Image-Text Encoders to Close the Descriptive-Commentative Gap

The interplay between the image and comment on a social media post is on...
research
04/27/2018

Extracting textual overlays from social media videos using neural networks

Textual overlays are often used in social media videos as people who wat...
research
08/01/2023

Wakey-Wakey: Animate Text by Mimicking Characters in a GIF

With appealing visual effects, kinetic typography (animated text) has pr...
research
10/13/2021

TAG: Toward Accurate Social Media Content Tagging with a Concept Graph

Although conceptualization has been widely studied in semantics and know...

Please sign up or login with your details

Forgot password? Click here to reset