3D-Speaker: A Large-Scale Multi-Device, Multi-Distance, and Multi-Dialect Corpus for Speech Representation Disentanglement

06/27/2023
by   Siqi Zheng, et al.
0

Disentangling uncorrelated information in speech utterances is a crucial research topic within speech community. Different speech-related tasks focus on extracting distinct speech representations while minimizing the affects of other uncorrelated information. We present a large-scale speech corpus to facilitate the research of speech representation disentanglement. 3D-Speaker contains over 10,000 speakers, each of whom are simultaneously recorded by multiple Devices, locating at different Distances, and some speakers are speaking multiple Dialects. The controlled combinations of multi-dimensional audio data yield a matrix of a diverse blend of speech representation entanglement, thereby motivating intriguing methods to untangle them. The multi-domain nature of 3D-Speaker also makes it a suitable resource to evaluate large universal speech models and experiment methods of out-of-domain learning and self-supervised learning. https://3dspeaker.github.io/

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/19/2019

Large-Scale Speaker Diarization of Radio Broadcast Archives

This paper describes our initial efforts to build a large-scale speaker ...
research
12/23/2020

CN-Celeb: multi-genre speaker recognition

Research on speaker recognition is extending to address the vulnerabilit...
research
09/19/2023

USED: Universal Speaker Extraction and Diarization

Speaker extraction and diarization are two crucial enabling techniques f...
research
10/12/2021

UniSpeech-SAT: Universal Speech Representation Learning with Speaker Aware Pre-Training

Self-supervised learning (SSL) is a long-standing goal for speech proces...
research
02/13/2020

Self-supervised learning for audio-visual speaker diarization

Speaker diarization, which is to find the speech segments of specific sp...
research
11/01/2022

Adapting self-supervised models to multi-talker speech recognition using speaker embeddings

Self-supervised learning (SSL) methods which learn representations of da...
research
07/03/2023

RobustL2S: Speaker-Specific Lip-to-Speech Synthesis exploiting Self-Supervised Representations

Significant progress has been made in speaker dependent Lip-to-Speech sy...

Please sign up or login with your details

Forgot password? Click here to reset