Unit selection synthesis based data augmentation for fixed phrase speaker verification

02/19/2021
by   Houjun Huang, et al.
0

Data augmentation is commonly used to help build a robust speaker verification system, especially in limited-resource case. However, conventional data augmentation methods usually focus on the diversity of acoustic environment, leaving the lexicon variation neglected. For text dependent speaker verification tasks, it's well-known that preparing training data with the target transcript is the most effectual approach to build a well-performing system, however collecting such data is time-consuming and expensive. In this work, we propose a unit selection synthesis based data augmentation method to leverage the abundant text-independent data resources. In this approach text-independent speeches of each speaker are firstly broke up to speech segments each contains one phone unit. Then segments that contain phonetics in the target transcript are selected to produce a speech with the target transcript by concatenating them in turn. Experiments are carried out on the AISHELL Speaker Verification Challenge 2019 database, the results and analysis shows that our proposed method can boost the system performance significantly.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/12/2020

Data augmentation enhanced speaker enrollment for text-dependent speaker verification

Data augmentation is commonly used for generating additional data from t...
research
11/21/2020

Exploring Voice Conversion based Data Augmentation in Text-Dependent Speaker Verification

In this paper, we focus on improving the performance of the text-depende...
research
04/21/2022

Cross-Speaker Emotion Transfer for Low-Resource Text-to-Speech Using Non-Parallel Voice Conversion with Pitch-Shift Data Augmentation

Data augmentation via voice conversion (VC) has been successfully applie...
research
06/03/2021

Speaker verification-derived loss and data augmentation for DNN-based multispeaker speech synthesis

Building multispeaker neural network-based text-to-speech synthesis syst...
research
07/20/2023

PAS: Partial Additive Speech Data Augmentation Method for Noise Robust Speaker Verification

Background noise reduces speech intelligibility and quality, making spea...
research
06/16/2023

Low-Resource Text-to-Speech Using Specific Data and Noise Augmentation

Many neural text-to-speech architectures can synthesize nearly natural s...
research
04/05/2021

SpeakerStew: Scaling to Many Languages with a Triaged Multilingual Text-Dependent and Text-Independent Speaker Verification System

In this paper, we describe SpeakerStew - a hybrid system to perform spea...

Please sign up or login with your details

Forgot password? Click here to reset