BabyBear: Cheap inference triage for expensive language models

05/24/2022
by   Leila Khalili, et al.
0

Transformer language models provide superior accuracy over previous models but they are computationally and environmentally expensive. Borrowing the concept of model cascading from computer vision, we introduce BabyBear, a framework for cascading models for natural language processing (NLP) tasks to minimize cost. The core strategy is inference triage, exiting early when the least expensive model in the cascade achieves a sufficiently high-confidence prediction. We test BabyBear on several open source data sets related to document classification and entity recognition. We find that for common NLP tasks a high proportion of the inference load can be accomplished with cheap, fast models that have learned by observing a deep learning model. This allows us to reduce the compute cost of large-scale classification jobs by more than 50 of the deep learning compute while maintaining an F1 score higher than 95 the CoNLL benchmark.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/02/2019

EduBERT: Pretrained Deep Language Models for Learning Analytics

The use of large pretrained neural networks to create contextualized wor...
research
06/19/2019

Surf at MEDIQA 2019: Improving Performance of Natural Language Inference in the Clinical Domain by Adopting Pre-trained Language Model

While deep learning techniques have shown promising results in many natu...
research
09/20/2023

Making Small Language Models Better Multi-task Learners with Mixture-of-Task-Adapters

Recently, Large Language Models (LLMs) have achieved amazing zero-shot l...
research
11/18/2018

Quantifying Uncertainties in Natural Language Processing Tasks

Reliable uncertainty quantification is a first step towards building exp...
research
05/20/2023

Dynamic Transformers Provide a False Sense of Efficiency

Despite much success in natural language processing (NLP), pre-trained l...
research
05/09/2023

FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance

There is a rapidly growing number of large language models (LLMs) that u...
research
04/13/2022

TangoBERT: Reducing Inference Cost by using Cascaded Architecture

The remarkable success of large transformer-based models such as BERT, R...

Please sign up or login with your details

Forgot password? Click here to reset