Speaker Recognition in the Wild

05/05/2022
by   Neeraj Chhimwal, et al.
0

In this paper, we propose a pipeline to find the number of speakers, as well as audios belonging to each of these now identified speakers in a source of audio data where number of speakers or speaker labels are not known a priori. We used this approach as a part of our Data Preparation pipeline for Speech Recognition in Indic Languages (https://github.com/Open-Speech-EkStep/vakyansh-wav2vec2-experimentation). To understand and evaluate the accuracy of our proposed pipeline, we introduce two metrics: Cluster Purity, and Cluster Uniqueness. Cluster Purity quantifies how "pure" a cluster is. Cluster Uniqueness, on the other hand, quantifies what percentage of clusters belong only to a single dominant speaker. We discuss more on these metrics in section <ref>. Since we develop this utility to aid us in identifying data based on speaker IDs before training an Automatic Speech Recognition (ASR) model, and since most of this data takes considerable effort to scrape, we also conclude that 98% of data gets mapped to the top 80% of clusters (computed by removing any clusters with less than a fixed number of utterances – we do this to get rid of some very small clusters and use this threshold as 30), in the test set chosen.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/20/2016

Speaker Cluster-Based Speaker Adaptive Training for Deep Neural Network Acoustic Modeling

A speaker cluster-based speaker adaptive training (SAT) method under dee...
research
07/11/2023

Speech Diarization and ASR with GMM

In this research paper, we delve into the topics of Speech Diarization a...
research
08/27/2020

Estimating Uniqueness of Human Voice UsingI-Vector Representation

We study the individuality of human voice with re-spect to a widely used...
research
06/24/2023

An Analysis of Personalized Speech Recognition System Development for the Deaf and Hard-of-Hearing

Deaf or hard-of-hearing (DHH) speakers typically have atypical speech ca...
research
05/08/2019

On the representation of speech and music

In most automatic speech recognition (ASR) systems, the audio signal is ...
research
12/19/2021

Multi-turn RNN-T for streaming recognition of multi-party speech

Automatic speech recognition (ASR) of single channel far-field recording...
research
08/26/2022

Investigating data partitioning strategies for crosslinguistic low-resource ASR evaluation

Many automatic speech recognition (ASR) data sets include a single pre-d...

Please sign up or login with your details

Forgot password? Click here to reset