Application of Knowledge Distillation to Multi-task Speech Representation Learning

10/29/2022
by   Mine Kerpicci, et al.
0

Model architectures such as wav2vec 2.0 and HuBERT have been proposed to learn speech representations from audio waveforms in a self-supervised manner. When these models are combined with downstream tasks such as speech recognition, they have been shown to provide state-of-the-art performance. However, these models use a large number of parameters, the smallest version of which has about 95 million parameters. This constitutes a challenge for edge AI device deployments. In this paper, we use knowledge distillation to reduce the original model size by about 75 Moreover, we use wav2vec 2.0 and HuBERT models for distillation and present a comprehensive performance analysis through our experiments where we fine-tune the distilled models on single task and multi-task frameworks separately. In particular, our experiments show that fine-tuning the distilled models on keyword spotting and speaker verification tasks result in only 0.1 and 0.9

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/27/2023

One-Step Knowledge Distillation and Fine-Tuning in Using Large Pre-Trained Self-Supervised Learning Models for Speaker Verification

The application of speech self-supervised learning (SSL) models has achi...
research
09/18/2023

Distilling HuBERT with LSTMs via Decoupled Knowledge Distillation

Much research effort is being applied to the task of compressing the kno...
research
07/06/2023

On-Device Constrained Self-Supervised Speech Representation Learning for Keyword Spotting via Knowledge Distillation

Large self-supervised models are effective feature extractors, but their...
research
07/14/2022

Deep versus Wide: An Analysis of Student Architectures for Task-Agnostic Knowledge Distillation of Self-Supervised Speech Models

Self-supervised learning (SSL) is seen as a very promising approach with...
research
10/27/2021

Temporal Knowledge Distillation for On-device Audio Classification

Improving the performance of on-device audio classification models remai...
research
05/17/2023

DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning

In this paper, we introduce self-distillation and online clustering for ...

Please sign up or login with your details

Forgot password? Click here to reset