Adaptive Knowledge Distillation between Text and Speech Pre-trained Models

03/07/2023
by   Jinjie Ni, et al.
0

Learning on a massive amount of speech corpus leads to the recent success of many self-supervised speech models. With knowledge distillation, these models may also benefit from the knowledge encoded by language models that are pre-trained on rich sources of texts. The distillation process, however, is challenging due to the modal disparity between textual and speech embedding spaces. This paper studies metric-based distillation to align the embedding space of text and speech with only a small amount of data without modifying the model structure. Since the semantic and granularity gap between text and speech has been omitted in literature, which impairs the distillation, we propose the Prior-informed Adaptive knowledge Distillation (PAD) that adaptively leverages text/speech units of variable granularity and prior distributions to achieve better global and local alignments between text and speech pre-trained models. We evaluate on three spoken language understanding benchmarks to show that PAD is more effective in transferring linguistic knowledge than other metric-based distillation approaches.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/14/2023

CoLLD: Contrastive Layer-to-layer Distillation for Compressing Multilingual Pre-trained Speech Encoders

Large-scale self-supervised pre-trained speech encoders outperform conve...
research
05/17/2020

Speech to Text Adaptation: Towards an Efficient Cross-Modal Distillation

Speech is one of the most effective means of communication and is full o...
research
11/06/2022

Bridging Speech and Textual Pre-trained Models with Unsupervised ASR

Spoken language understanding (SLU) is a task aiming to extract high-lev...
research
10/23/2022

Bootstrapping meaning through listening: Unsupervised learning of spoken sentence embeddings

Inducing semantic representations directly from speech signals is a high...
research
07/26/2021

Text is Text, No Matter What: Unifying Text Recognition using Knowledge Distillation

Text recognition remains a fundamental and extensively researched topic ...
research
03/07/2022

Enhance Language Identification using Dual-mode Model with Knowledge Distillation

In this paper, we propose to employ a dual-mode framework on the x-vecto...
research
06/26/2023

The Art of Embedding Fusion: Optimizing Hate Speech Detection

Hate speech detection is a challenging natural language processing task ...

Please sign up or login with your details

Forgot password? Click here to reset