Adapting End-to-End Neural Speaker Verification to New Languages and Recording Conditions with Adversarial Training

11/07/2018
by   Gautam Bhattacharya, et al.
0

In this article we propose a novel approach for adapting speaker embeddings to new domains based on adversarial training of neural networks. We apply our embeddings to the task of text-independent speaker verification, a challenging, real-world problem in biometric security. We further the development of end-to-end speaker embedding models by combing a novel 1-dimensional, self-attentive residual network, an angular margin loss function and adversarial training strategy. Our model is able to learn extremely compact, 64-dimensional speaker embeddings that deliver competitive performance on a number of popular datasets using simple cosine distance scoring. One the NIST-SRE 2016 task we are able to beat a strong i-vector baseline, while on the Speakers in the Wild task our model was able to outperform both i-vector and x-vector baselines, showing an absolute improvement of 2.19 Additionally, we show that the integration of adversarial training consistently leads to a significant improvement over an unadapted model.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/07/2018

Generative Adversarial Speaker Embedding Networks for Domain Robust End-to-End Speaker Verification

This article presents a novel approach for learning domain-invariant spe...
research
04/05/2021

Dr-Vectors: Decision Residual Networks and an Improved Loss for Speaker Recognition

Many neural network speaker recognition systems model each speaker using...
research
10/25/2019

Channel adversarial training for speaker verification and diarization

Previous work has encouraged domain-invariance in deep speaker embedding...
research
08/09/2020

Cosine-Distance Virtual Adversarial Training for Semi-Supervised Speaker-Discriminative Acoustic Embeddings

In this paper, we propose a semi-supervised learning (SSL) technique for...
research
11/23/2018

Training Multi-Task Adversarial Network For Extracting Noise-Robust Speaker Embedding

Under noisy environments, to achieve the robust performance of speaker r...
research
10/24/2019

Delving into VoxCeleb: environment invariant speaker recognition

Research in speaker recognition has recently seen significant progress d...
research
06/22/2023

Vec2Vec: A Compact Neural Network Approach for Transforming Text Embeddings with High Fidelity

Vector embeddings have become ubiquitous tools for many language-related...

Please sign up or login with your details

Forgot password? Click here to reset