U-vectors: Generating clusterable speaker embedding from unlabeled data

02/07/2021
by   M. F. Mridha, et al.
0

Speaker recognition deals with recognizing speakers by their speech. Strategies related to speaker recognition may explore speech timbre properties, accent, speech patterns and so on. Supervised speaker recognition has been dramatically investigated. However, through rigorous excavation, we have found that unsupervised speaker recognition systems mostly depend on domain adaptation policy. This paper introduces a speaker recognition strategy dealing with unlabeled data, which generates clusterable embedding vectors from small fixed-size speech frames. The unsupervised training strategy involves an assumption that a small speech segment should include a single speaker. Depending on such a belief, we construct pairwise constraints to train twin deep learning architectures with noise augmentation policies, that generate speaker embeddings. Without relying on domain adaption policy, the process unsupervisely produces clusterable speaker embeddings, and we name it unsupervised vectors (u-vectors). The evaluation is concluded in two popular speaker recognition datasets for English language, TIMIT, and LibriSpeech. Also, we include a Bengali dataset, Bengali ASR, to illustrate the diversity of the domain shifts for speaker recognition systems. Finally, we conclude that the proposed approach achieves remarkable performance using pairwise architectures.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/06/2019

Self-supervised speaker embeddings

Contrary to i-vectors, speaker embeddings such as x-vectors are incapabl...
research
10/21/2020

Learning Speaker Embedding from Text-to-Speech

Zero-shot multi-speaker Text-to-Speech (TTS) generates target speaker vo...
research
10/24/2016

UTD-CRSS Systems for 2016 NIST Speaker Recognition Evaluation

This document briefly describes the systems submitted by the Center for ...
research
09/30/2019

Embeddings for DNN speaker adaptive training

In this work, we investigate the use of embeddings for speaker-adaptive ...
research
12/01/2021

STEM: Unsupervised STructural EMbedding for Stance Detection

Stance detection is an important task, supporting many downstream tasks ...
research
04/16/2019

I4U Submission to NIST SRE 2018: Leveraging from a Decade of Shared Experiences

The I4U consortium was established to facilitate a joint entry to NIST s...
research
02/22/2018

Neural Predictive Coding using Convolutional Neural Networks towards Unsupervised Learning of Speaker Characteristics

Learning speaker-specific features is vital in many applications like sp...

Please sign up or login with your details

Forgot password? Click here to reset