Self-Distillation Network with Ensemble Prototypes: Learning Robust Speaker Representations without Supervision

08/05/2023
by   Yafeng Chen, et al.
0

Training speaker-discriminative and robust speaker verification systems without speaker labels is still challenging and worthwhile to explore. Previous studies have noted a substantial performance disparity between self-supervised and fully supervised approaches. In this paper, we propose an effective Self-Distillation network with Ensemble Prototypes (SDEP) to facilitate self-supervised speaker representation learning. A range of experiments conducted on the VoxCeleb datasets demonstrate the superiority of the SDEP framework in speaker verification. SDEP achieves a new SOTA on Voxceleb1 speaker verification evaluation benchmark ( i.e., equal error rate 1.94%, 1.99%, and 3.77% for trial Vox1-O, Vox1-E and Vox1-H , respectively), discarding any speaker labels in the training phase. Code will be publicly available at https://github.com/alibaba-damo-academy/3D-Speaker.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/08/2022

Pushing the limits of self-supervised speaker verification using regularized distillation framework

Training robust speaker verification systems without speaker labels has ...
research
10/28/2022

A comprehensive study on self-supervised distillation for speaker representation learning

In real application scenarios, it is often challenging to obtain a large...
research
03/16/2022

Raw waveform speaker verification for supervised and self-supervised learning

Speaker verification models that directly operate upon raw waveforms are...
research
10/28/2022

Speaker recognition with two-step multi-modal deep cleansing

Neural network-based speaker recognition has achieved significant improv...
research
03/28/2022

Training speaker recognition systems with limited data

This work considers training neural networks for speaker recognition wit...
research
03/06/2022

C-P Map: A Novel Evaluation Toolkit for Speaker Verification

Evaluation trials are used to probe performance of automatic speaker ver...
research
12/11/2020

Exploring wav2vec 2.0 on speaker verification and language identification

Wav2vec 2.0 is a recently proposed self-supervised framework for speech ...

Please sign up or login with your details

Forgot password? Click here to reset