How Emotionally Stable is ALBERT? Testing Robustness with Stochastic Weight Averaging on a Sentiment Analysis Task

11/18/2021
by   Urja Khurana, et al.
0

Despite their success, modern language models are fragile. Even small changes in their training pipeline can lead to unexpected results. We study this phenomenon by examining the robustness of ALBERT (arXiv:1909.11942) in combination with Stochastic Weight Averaging (SWA) (arXiv:1803.05407) – a cheap way of ensembling – on a sentiment analysis task (SST-2). In particular, we analyze SWA's stability via CheckList criteria (arXiv:2005.04118), examining the agreement on errors made by models differing only in their random seed. We hypothesize that SWA is more stable because it ensembles model snapshots taken along the gradient descent trajectory. We quantify stability by comparing the models' mistakes with Fleiss' Kappa (Fleiss, 1971) and overlap ratio scores. We find that SWA reduces error rates in general; yet the models still suffer from their own distinct biases (according to CheckList).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/23/2019

On Model Stability as a Function of Random Seed

In this paper, we focus on quantifying model stability as a function of ...
research
08/19/2022

Causal Intervention Improves Implicit Sentiment Analysis

Despite having achieved great success for sentiment analysis, existing n...
research
08/18/2018

Emoji Sentiment Scores of Writers using Odds Ratio and Fisher Exact Test

The sentiment of a given emoji is traditionally calculated by averaging ...
research
07/18/2023

Automated Ableism: An Exploration of Explicit Disability Biases in Sentiment and Toxicity Analysis Models

We analyze sentiment analysis and toxicity detection models to detect th...
research
05/06/2021

On the logistical difficulties and findings of Jopara Sentiment Analysis

This paper addresses the problem of sentiment analysis for Jopara, a cod...
research
09/29/2022

Stop Wasting My Time! Saving Days of ImageNet and BERT Training with Latest Weight Averaging

Training vision or language models on large datasets can take days, if n...
research
08/08/2021

A Look at the Evaluation Setup of the M5 Forecasting Competition

Forecast evaluation plays a key role in how empirical evidence shapes th...

Please sign up or login with your details

Forgot password? Click here to reset