DiPair: Fast and Accurate Distillation for Trillion-Scale Text Matching and Pair Modeling

10/07/2020
by   Jiecao Chen, et al.
0

Pre-trained models like BERT (Devlin et al., 2018) have dominated NLP / IR applications such as single sentence classification, text pair classification, and question answering. However, deploying these models in real systems is highly non-trivial due to their exorbitant computational costs. A common remedy to this is knowledge distillation (Hinton et al., 2015), leading to faster inference. However – as we show here – existing works are not optimized for dealing with pairs (or tuples) of texts. Consequently, they are either not scalable or demonstrate subpar performance. In this work, we propose DiPair – a novel framework for distilling fast and accurate models on text pair tasks. Coupled with an end-to-end training strategy, DiPair is both highly scalable and offers improved quality-speed tradeoffs. Empirical studies conducted on both academic and real-world e-commerce benchmarks demonstrate the efficacy of the proposed approach with speedups of over 350x and minimal quality drop relative to the cross-attention teacher BERT model.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/08/2020

Distilling Knowledge from Pre-trained Language Models via Text Smoothing

This paper studies compressing pre-trained language models, like BERT (D...
research
10/18/2019

Model Compression with Two-stage Multi-teacher Knowledge Distillation for Web Question Answering System

Deep pre-training and fine-tuning models (such as BERT and OpenAI GPT) h...
research
09/22/2021

K-AID: Enhancing Pre-trained Language Models with Domain Knowledge for Question Answering

Knowledge enhanced pre-trained language models (K-PLMs) are shown to be ...
research
08/14/2019

Scalable Attentive Sentence-Pair Modeling via Distilled Sentence Embedding

Attention based models have become the new state-of-the-art in natural l...
research
09/26/2021

Improving Question Answering Performance Using Knowledge Distillation and Active Learning

Contemporary question answering (QA) systems, including transformer-base...
research
10/11/2022

Once is Enough: A Light-Weight Cross-Attention for Fast Sentence Pair Modeling

Transformer-based models have achieved great success on sentence pair mo...
research
07/26/2021

Text is Text, No Matter What: Unifying Text Recognition using Knowledge Distillation

Text recognition remains a fundamental and extensively researched topic ...

Please sign up or login with your details

Forgot password? Click here to reset