Self-Supervised Learning with Cluster-Aware-DINO for High-Performance Robust Speaker Verification

by   Bing Han, et al.

Automatic speaker verification task has made great achievements using deep learning approaches with the large-scale manually annotated dataset. However, it's very difficult and expensive to collect a large amount of well-labeled data for system building. In this paper, we propose a novel and advanced self-supervised learning framework which can construct a high performance speaker verification system without using any labeled data. To avoid the impact of false negative pairs, we adopt the self-distillation with no labels (DINO) framework as the initial model, which can be trained without exploiting negative pairs. Then, we introduce a cluster-aware training strategy for DINO to improve the diversity of data. In the iteration learning stage, due to a mass of unreliable labels from clustering, the quality of pseudo labels is important for the system training. This motivates us to propose dynamic loss-gate and label correction (DLG-LC) methods to alleviate the performance degradation caused by unreliable labels. More specifically, we model the loss distribution with GMM and obtain the loss-gate threshold dynamically to distinguish the reliable and unreliable labels. Besides, we adopt the model predictions to correct the unreliable label, for better utilizing the unreliable data rather than dropping them directly. Moreover, we extend the DLG-LC to multi-modality to further improve the performance. The experiments are performed on the commonly used Voxceleb dataset. Compared to the best-known self-supervised speaker verification system, our proposed method obtain 22.17 27.94 even with fewer iterations, smaller models, and simpler clustering methods. More importantly, the newly proposed system even achieves comparable results with the fully supervised system, but without using any human labeled data.


Self-Supervised Speaker Verification Using Dynamic Loss-Gate and Label Correction

For self-supervised speaker verification, the quality of pseudo labels d...

Pushing the limits of self-supervised speaker verification using regularized distillation framework

Training robust speaker verification systems without speaker labels has ...

OR-Gate: A Noisy Label Filtering Method for Speaker Verification

The deep learning models used for speaker verification are heavily depen...

Label-Efficient Self-Supervised Speaker Verification With Information Maximization and Contrastive Learning

State-of-the-art speaker verification systems are inherently dependent o...

Self-supervised Speaker Diarization

Over the last few years, deep learning has grown in popularity for speak...

Labels, Information, and Computation: Efficient, Privacy-Preserving Learning Using Sufficient Labels

In supervised learning, obtaining a large set of fully-labeled training ...

Self-Supervised Training of Speaker Encoder with Multi-Modal Diverse Positive Pairs

We study a novel neural architecture and its training strategies of spea...

Please sign up or login with your details

Forgot password? Click here to reset