Self-Distillation Network with Ensemble Prototypes: Learning Robust Speaker Representations without Supervision

by   Yafeng Chen, et al.
Alibaba Group

Training speaker-discriminative and robust speaker verification systems without speaker labels is still challenging and worthwhile to explore. Previous studies have noted a substantial performance disparity between self-supervised and fully supervised approaches. In this paper, we propose an effective Self-Distillation network with Ensemble Prototypes (SDEP) to facilitate self-supervised speaker representation learning. A range of experiments conducted on the VoxCeleb datasets demonstrate the superiority of the SDEP framework in speaker verification. SDEP achieves a new SOTA on Voxceleb1 speaker verification evaluation benchmark ( i.e., equal error rate 1.94%, 1.99%, and 3.77% for trial Vox1-O, Vox1-E and Vox1-H , respectively), discarding any speaker labels in the training phase. Code will be publicly available at


page 1

page 2

page 3

page 4


Pushing the limits of self-supervised speaker verification using regularized distillation framework

Training robust speaker verification systems without speaker labels has ...

A comprehensive study on self-supervised distillation for speaker representation learning

In real application scenarios, it is often challenging to obtain a large...

Raw waveform speaker verification for supervised and self-supervised learning

Speaker verification models that directly operate upon raw waveforms are...

Speaker recognition with two-step multi-modal deep cleansing

Neural network-based speaker recognition has achieved significant improv...

Training speaker recognition systems with limited data

This work considers training neural networks for speaker recognition wit...

C-P Map: A Novel Evaluation Toolkit for Speaker Verification

Evaluation trials are used to probe performance of automatic speaker ver...

Exploring wav2vec 2.0 on speaker verification and language identification

Wav2vec 2.0 is a recently proposed self-supervised framework for speech ...

Please sign up or login with your details

Forgot password? Click here to reset