A Unified Deep Speaker Embedding Framework for Mixed-Bandwidth Speech Data

12/01/2020
by   Weicheng Cai, et al.
0

This paper proposes a unified deep speaker embedding framework for modeling speech data with different sampling rates. Considering the narrowband spectrogram as a sub-image of the wideband spectrogram, we tackle the joint modeling problem of the mixed-bandwidth data in an image classification manner. From this perspective, we elaborate several mixed-bandwidth joint training strategies under different training and test data scenarios. The proposed systems are able to flexibly handle the mixed-bandwidth speech data in a single speaker embedding model without any additional downsampling, upsampling, bandwidth extension, or padding operations. We conduct extensive experimental studies on the VoxCeleb1 dataset. Furthermore, the effectiveness of the proposed approach is validated by the SITW and NIST SRE 2016 datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/05/2019

Bandwidth Embeddings for Mixed-bandwidth Speech Recognition

In this paper, we tackle the problem of handling narrowband and wideband...
research
02/24/2022

On the relevance of bandwidth extension for speaker identification

In this paper we discuss the relevance of bandwidth extension for speake...
research
03/30/2022

Joint domain adaptation and speech bandwidth extension using time-domain GANs for speaker verification

Speech systems developed for a particular choice of acoustic domain and ...
research
07/19/2019

DNN-based Speaker Embedding Using Subjective Inter-speaker Similarity for Multi-speaker Modeling in Speech Synthesis

This paper proposes novel algorithms for speaker embedding using subject...
research
04/05/2022

On the Relevance of Bandwidth Extension for Speaker Verification

In this paper, we consider the effect of a bandwidth extension of narrow...
research
04/12/2019

Building a mixed-lingual neural TTS system with only monolingual data

When deploying a Chinese neural text-to-speech (TTS) synthesis system, o...
research
05/09/2022

Bandwidth-Scalable Fully Mask-Based Deep FCRN Acoustic Echo Cancellation and Postfiltering

Although today's speech communication systems support various bandwidths...

Please sign up or login with your details

Forgot password? Click here to reset