Using Selective Masking as a Bridge between Pre-training and Fine-tuning

11/24/2022
by   Tanish Lad, et al.
0

Pre-training a language model and then fine-tuning it for downstream tasks has demonstrated state-of-the-art results for various NLP tasks. Pre-training is usually independent of the downstream task, and previous works have shown that this pre-training alone might not be sufficient to capture the task-specific nuances. We propose a way to tailor a pre-trained BERT model for the downstream task via task-specific masking before the standard supervised fine-tuning. For this, a word list is first collected specific to the task. For example, if the task is sentiment classification, we collect a small sample of words representing both positive and negative sentiments. Next, a word's importance for the task, called the word's task score, is measured using the word list. Each word is then assigned a probability of masking based on its task score. We experiment with different masking functions that assign the probability of masking based on the word's task score. The BERT model is further trained on MLM objective, where masking is done using the above strategy. Following this standard supervised fine-tuning is done for different downstream tasks. Results on these tasks show that the selective masking strategy outperforms random masking, indicating its effectiveness.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/21/2020

Train No Evil: Selective Masking for Task-guided Pre-training

Recently, pre-trained language models mostly follow the pre-training-the...
research
08/15/2019

Visualizing and Understanding the Effectiveness of BERT

Language model pre-training, such as BERT, has achieved remarkable resul...
research
09/14/2021

Different Strokes for Different Folks: Investigating Appropriate Further Pre-training Approaches for Diverse Dialogue Tasks

Loading models pre-trained on the large-scale corpus in the general doma...
research
05/22/2023

Learning Easily Updated General Purpose Text Representations with Adaptable Task-Specific Prefixes

Many real-world applications require making multiple predictions from th...
research
04/14/2021

Masked Language Modeling and the Distributional Hypothesis: Order Word Matters Pre-training for Little

A possible explanation for the impressive performance of masked language...
research
09/28/2020

Domain Adversarial Fine-Tuning as an Effective Regularizer

In Natural Language Processing (NLP), pre-trained language models (LMs) ...
research
06/21/2023

SIFTER: A Task-specific Alignment Strategy for Enhancing Sentence Embeddings

The paradigm of pre-training followed by fine-tuning on downstream tasks...

Please sign up or login with your details

Forgot password? Click here to reset