Can Whisper perform speech-based in-context learning

09/13/2023
by   Siyin Wang, et al.
0

This paper investigates the in-context learning abilities of the Whisper automatic speech recognition (ASR) models released by OpenAI. A novel speech-based in-context learning (SICL) approach is proposed for test-time adaptation, which can reduce the word error rates (WERs) with only a small number of labelled speech samples without gradient descent. Language-level adaptation experiments using Chinese dialects showed that when applying SICL to isolated word ASR, consistent and considerable relative WER reductions can be achieved using Whisper models of any size on two dialects, which is on average 32.3 be applied to further improve the efficiency of SICL, which can increase the average relative WER reduction to 36.4 speaker adaptation or continuous speech recognition tasks, and both achieved considerable relative WER reductions. Detailed quantitative analyses are also provided to shed light on SICL's adaptability to phonological variances and dialect-specific lexical nuances.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/17/2022

Unsupervised Model-based speaker adaptation of end-to-end lattice-free MMI model for speech recognition

Modeling the speaker variability is a key challenge for automatic speech...
research
02/21/2022

Speaker Adaptation Using Spectro-Temporal Deep Features for Dysarthric and Elderly Speech Recognition

Despite the rapid progress of automatic speech recognition (ASR) technol...
research
04/08/2021

Exploring Machine Speech Chain for Domain Adaptation and Few-Shot Speaker Adaptation

Machine Speech Chain, which integrates both end-to-end (E2E) automatic s...
research
09/17/2023

Enhancing Quantised End-to-End ASR Models via Personalisation

Recent end-to-end automatic speech recognition (ASR) models have become ...
research
07/08/2022

Tandem Multitask Training of Speaker Diarisation and Speech Recognition for Meeting Transcription

Self-supervised-learning-based pre-trained models for speech data, such ...
research
07/13/2023

Exploring the Integration of Large Language Models into Automatic Speech Recognition Systems: An Empirical Study

This paper explores the integration of Large Language Models (LLMs) into...
research
06/03/2023

SGEM: Test-Time Adaptation for Automatic Speech Recognition via Sequential-Level Generalized Entropy Minimization

Automatic speech recognition (ASR) models are frequently exposed to data...

Please sign up or login with your details

Forgot password? Click here to reset