When BERT Plays the Lottery, All Tickets Are Winning

05/01/2020
by   Sai Prasanna, et al.
0

Much of the recent success in NLP is due to the large Transformer-based models such as BERT (Devlin et al, 2019). However, these models have been shown to be reducible to a smaller number of self-attention heads and layers. We consider this phenomenon from the perspective of the lottery ticket hypothesis. For fine-tuned BERT, we show that (a) it is possible to find a subnetwork of elements that achieves performance comparable with that of the full model, and (b) similarly-sized subnetworks sampled from the rest of the model perform worse. However, the "bad" subnetworks can be fine-tuned separately to achieve only slightly worse performance than the "good" ones, indicating that most weights in the pre-trained BERT are potentially useful. We also show that the "good" subnetworks vary considerably across GLUE tasks, opening up the possibilities to learn what knowledge BERT actually uses at inference time.

READ FULL TEXT

page 4

page 5

page 6

page 11

page 12

page 13

research
04/08/2020

Improving BERT with Self-Supervised Attention

One of the most popular paradigms of applying large, pre-trained NLP mod...
research
09/29/2020

Gender prediction using limited Twitter Data

Transformer models have shown impressive performance on a variety of NLP...
research
08/21/2019

Revealing the Dark Secrets of BERT

BERT-based architectures currently give state-of-the-art performance on ...
research
12/20/2022

Transformers Go for the LOLs: Generating (Humourous) Titles from Scientific Abstracts End-to-End

We consider the end-to-end abstract-to-title generation problem, explori...
research
09/14/2023

Revisiting Supertagging for HPSG

We present new supertaggers trained on HPSG-based treebanks. These treeb...
research
02/19/2023

Can ChatGPT Understand Too? A Comparative Study on ChatGPT and Fine-tuned BERT

Recently, ChatGPT has attracted great attention, as it can generate flue...
research
08/15/2023

Finding Stakeholder-Material Information from 10-K Reports using Fine-Tuned BERT and LSTM Models

All public companies are required by federal securities law to disclose ...

Please sign up or login with your details

Forgot password? Click here to reset