Knowledge Distillation of Large Language Models

06/14/2023
by   Yuxian Gu, et al.
0

Knowledge Distillation (KD) is a promising technique for reducing the high computational demand of large language models (LLMs). However, previous KD methods are primarily applied to white-box classification models or training small models to imitate black-box model APIs like ChatGPT. How to effectively distill the knowledge from white-box generative LLMs is still under-explored, which becomes more and more important with the prosperity of LLMs. In this work, we propose MiniLLM that distills smaller language models from generative larger language models. We first replace the forward Kullback-Leibler divergence (KLD) objective in the standard KD approaches with reverse KLD, which is more suitable for KD on generative language models, to prevent the student model from overestimating the low-probability regions of the teacher distribution. Then, we derive an effective optimization approach to learn this objective. Extensive experiments in the instruction-following setting show that the MiniLLM models generate more precise responses with the higher overall quality, lower exposure bias, better calibration, and higher long-text generation performance. Our method is also scalable for different model families with 120M to 13B parameters. We will release our code and model checkpoints at https://aka.ms/MiniLLM.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/23/2021

Dynamic Knowledge Distillation for Pre-trained Language Models

Knowledge distillation (KD) has been proved effective for compressing la...
research
08/13/2023

Token-Scaled Logit Distillation for Ternary Weight Generative Language Models

Generative Language Models (GLMs) have shown impressive performance in t...
research
07/27/2023

f-Divergence Minimization for Sequence-Level Knowledge Distillation

Knowledge distillation (KD) is the process of transferring knowledge fro...
research
03/15/2023

SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models

Generative Large Language Models (LLMs) such as GPT-3 are capable of gen...
research
07/19/2023

On the Origin of LLMs: An Evolutionary Tree and Graph for 15,821 Large Language Models

Since late 2022, Large Language Models (LLMs) have become very prominent...
research
07/06/2023

Distilling Large Vision-Language Model with Out-of-Distribution Generalizability

Large vision-language models have achieved outstanding performance, but ...
research
05/04/2022

Knowledge Distillation of Russian Language Models with Reduction of Vocabulary

Today, transformer language models serve as a core component for majorit...

Please sign up or login with your details

Forgot password? Click here to reset