Latent space representation for multi-target speaker detection and identification with a sparse dataset using Triplet neural networks

10/01/2019
by   Kin Wai Cheuk, et al.
0

We present an approach to tackle the speaker recognition problem using Triplet Neural Networks. Currently, the i-vector representation with probabilistic linear discriminant analysis (PLDA) is the most commonly used technique to solve this problem, due to high classification accuracy with a relatively short computation time. In this paper, we explore a neural network approach, namely Triplet Neural Networks (TNNs), to built a latent space for different classifiers to solve the Multi-Target Speaker Detection and Identification Challenge Evaluation 2018 (MCE 2018) dataset. This training set contains i-vectors from 3,631 speakers, with only 3 samples for each speaker, thus making speaker recognition a challenging task. When using the train and development set for training both the TNN and baseline model (i.e., similarity evaluation directly on the i-vector representation), our proposed model outperforms the baseline by 23 the train set, our method results in 309 confusions for the Multi-target speaker identification task, which is 46 results show that the representational power of TNNs is especially evident when training on small datasets with few instances available per class.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/06/2019

Centroid-based deep metric learning for speaker recognition

Speaker embedding models that utilize neural networks to map utterances ...
research
05/05/2017

Deep Speaker: an End-to-End Neural Speaker Embedding System

We present Deep Speaker, a neural speaker embedding system that maps utt...
research
08/06/2019

Triplet Based Embedding Distance and Similarity Learning for Text-independent Speaker Verification

Speaker embeddings become growing popular in the text-independent speake...
research
04/02/2019

Experiments on Open-Set Speaker Identification with Discriminatively Trained Neural Networks

This paper presents a study on discriminative artificial neural network ...
research
12/13/2018

Modeling Multi-speaker Latent Space to Improve Neural TTS: Quick Enrolling New Speaker and Enhancing Premium Voice

Neural TTS has shown it can generate high quality synthesized speech. In...
research
10/28/2019

Accurate and Scalable Version Identification Using Musically-Motivated Embeddings

The version identification (VI) task deals with the automatic detection ...
research
09/24/2019

Improving Robustness In Speaker Identification Using A Two-Stage Attention Model

In this paper a novel framework to tackle speaker recognition using a tw...

Please sign up or login with your details

Forgot password? Click here to reset