The future is different: Large pre-trained language models fail in prediction tasks

11/01/2022
by   Kostadin Cvejoski, et al.
0

Large pre-trained language models (LPLM) have shown spectacular success when fine-tuned on downstream supervised tasks. Yet, it is known that their performance can drastically drop when there is a distribution shift between the data used during training and that used at inference time. In this paper we focus on data distributions that naturally change over time and introduce four new REDDIT datasets, namely the WALLSTREETBETS, ASKSCIENCE, THE DONALD, and POLITICS sub-reddits. First, we empirically demonstrate that LPLM can display average performance drops of about 88 popularity of future posts from sub-reddits whose topic distribution changes with time. We then introduce a simple methodology that leverages neural variational dynamic topic models and attention mechanisms to infer temporal language model representations for regression tasks. Our models display performance drops of only about 40 when predicting the popularity of future posts, while using only about 7 the total number of parameters of LPLM and providing interpretable representations that offer insight into real-world events, like the GameStop short squeeze of 2021

READ FULL TEXT
research
04/08/2022

Towards Understanding Large-Scale Discourse Structures in Pre-Trained and Fine-Tuned Language Models

With a growing number of BERTology work analyzing different components o...
research
05/28/2022

Few-shot Subgoal Planning with Language Models

Pre-trained large language models have shown successful progress in many...
research
09/03/2020

Attention Flows: Analyzing and Comparing Attention Mechanisms in Language Models

Advances in language modeling have led to the development of deep attent...
research
01/27/2023

Large Language Models Are Implicitly Topic Models: Explaining and Finding Good Demonstrations for In-Context Learning

In recent years, pre-trained large language models have demonstrated rem...
research
12/31/2019

oLMpics – On what Language Model Pre-training Captures

Recent success of pre-trained language models (LMs) has spurred widespre...
research
10/17/2022

Pseudo-OOD training for robust language models

While pre-trained large-scale deep models have garnered attention as an ...
research
08/04/2023

Tweet Insights: A Visualization Platform to Extract Temporal Insights from Twitter

This paper introduces a large collection of time series data derived fro...

Please sign up or login with your details

Forgot password? Click here to reset