Multi-user VoiceFilter-Lite via Attentive Speaker Embedding

07/02/2021
by   Rajeev Rikhye, et al.
0

In this paper, we propose a solution to allow speaker conditioned speech models, such as VoiceFilter-Lite, to support an arbitrary number of enrolled users in a single pass. This is achieved by using an attention mechanism on multiple speaker embeddings to compute a single attentive embedding, which is then used as a side input to the model. We implemented multi-user VoiceFilter-Lite and evaluated it for three tasks: (1) a streaming automatic speech recognition (ASR) task; (2) a text-independent speaker verification task; and (3) a personalized keyphrase detection task, where ASR has to detect keyphrases from multiple enrolled users in a noisy environment. Our experiments show that, with up to four enrolled users, multi-user VoiceFilter-Lite is able to significantly reduce speech recognition and speaker verification errors when there is overlapping speech, without affecting performance under other acoustic conditions. This attentive speaker embedding approach can also be easily applied to other speaker-conditioned models such as personal VAD and personalized ASR.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/24/2022

Closing the Gap between Single-User and Multi-User VoiceFilter-Lite

VoiceFilter-Lite is a speaker-conditioned voice separation model that pl...
research
04/28/2021

Personalized Keyphrase Detection using Speaker and Environment Information

In this paper, we introduce a streaming keyphrase detection system that ...
research
11/07/2021

Retrieving Speaker Information from Personalized Acoustic Models for Speech Recognition

The widespread of powerful personal devices capable of collecting voice ...
research
05/16/2019

Articulatory and bottleneck features for speaker-independent ASR of dysarthric speech

The rapid population aging has stimulated the development of assistive d...
research
04/08/2022

Personal VAD 2.0: Optimizing Personal Voice Activity Detection for On-Device Speech Recognition

Personalization of on-device speech recognition (ASR) has seen explosive...
research
05/10/2022

Best of Both Worlds: Multi-task Audio-Visual Automatic Speech Recognition and Active Speaker Detection

Under noisy conditions, automatic speech recognition (ASR) can greatly b...
research
10/31/2022

An analysis of degenerating speech due to progressive dysarthria on ASR performance

Although personalized automatic speech recognition (ASR) models have rec...

Please sign up or login with your details

Forgot password? Click here to reset