Human vs. Muppet: A Conservative Estimate of Human Performance on the GLUE Benchmark

05/24/2019
by   Nikita Nangia, et al.
5

The GLUE benchmark (Wang et al., 2019b) is a suite of language understanding tasks which has seen dramatic progress in the past year, with average performance moving from 70.0 at launch to 83.9, state of the art at the time of writing (May 24, 2019). Here, we measure human performance on the benchmark, in order to learn whether significant headroom remains for further progress. We provide a conservative estimate of human performance on the benchmark through crowdsourcing: Our annotators are non-experts who must learn each task from a brief set of instructions and 20 examples. In spite of limited training, these annotators robustly outperform the state of the art on six of the nine GLUE tasks and achieve an average score of 87.1. Given the fast pace of progress however, the headroom we observe is quite limited. To reproduce the data-poor setting that our annotators must learn in, we also train the BERT model (Devlin et al., 2019) in limited-data regimes, and conclude that low-resource sentence classification remains a challenge for modern neural network approaches to text understanding.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/24/2019

Human vs. Muppet: A Conservative Estimate of HumanPerformance on the GLUE Benchmark

The GLUE benchmark (Wang et al., 2019b) is a suite of language understan...
research
05/02/2019

SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems

In the last year, new models and methods for pretraining and transfer le...
research
02/09/2019

The Omniglot Challenge: A 3-Year Progress Report

Three years ago, we released the Omniglot dataset for developing more hu...
research
04/15/2021

XTREME-R: Towards More Challenging and Nuanced Multilingual Evaluation

Machine learning has brought striking advances in multilingual natural l...
research
11/11/2019

Attending to Entities for Better Text Understanding

Recent progress in NLP witnessed the development of large-scale pre-trai...
research
12/12/2016

Tracking the World State with Recurrent Entity Networks

We introduce a new model, the Recurrent Entity Network (EntNet). It is e...
research
12/12/2020

Revisiting "Qualitatively Characterizing Neural Network Optimization Problems"

We revisit and extend the experiments of Goodfellow et al. (2014), who s...

Please sign up or login with your details

Forgot password? Click here to reset