FMFCC-A: A Challenging Mandarin Dataset for Synthetic Speech Detection

by   Zhenyu Zhang, et al.

As increasing development of text-to-speech (TTS) and voice conversion (VC) technologies, the detection of synthetic speech has been suffered dramatically. In order to promote the development of synthetic speech detection model against Mandarin TTS and VC technologies, we have constructed a challenging Mandarin dataset and organized the accompanying audio track of the first fake media forensic challenge of China Society of Image and Graphics (FMFCC-A). The FMFCC-A dataset is by far the largest publicly-available Mandarin dataset for synthetic speech detection, which contains 40,000 synthesized Mandarin utterances that generated by 11 Mandarin TTS systems and two Mandarin VC systems, and 10,000 genuine Mandarin utterances collected from 58 speakers. The FMFCC-A dataset is divided into the training, development and evaluation sets, which are used for the research of detection of synthesized Mandarin speech under various previously unknown speech synthesis systems or audio post-processing operations. In addition to describing the construction of the FMFCC-A dataset, we provide a detailed analysis of two baseline methods and the top-performing submissions from the FMFCC-A, which illustrates the usefulness and challenge of FMFCC-A dataset. We hope that the FMFCC-A dataset can fill the gap of lack of Mandarin datasets for synthetic speech detection.


page 1

page 2

page 3

page 4


SceneFake: An Initial Dataset and Benchmarks for Scene Fake Audio Detection

Previous databases have been designed to further the development of fake...

The Sound of Silence: Efficiency of First Digit Features in Synthetic Audio Detection

The recent integration of generative neural strategies and audio process...

Detecting Synthetic Speech Manipulation in Real Audio Recordings

Recent advances in artificial speech and audio technologies have improve...

SOMOS: The Samsung Open MOS Dataset for the Evaluation of Neural Text-to-Speech Synthesis

In this work, we present the SOMOS dataset, the first large-scale mean o...

SingFake: Singing Voice Deepfake Detection

The rise of singing voice synthesis presents critical challenges to arti...

Open Challenges in Synthetic Speech Detection

In this paper the current status and open challenges of synthetic speech...

Syn-Att: Synthetic Speech Attribution via Semi-Supervised Unknown Multi-Class Ensemble of CNNs

With the huge technological advances introduced by deep learning in audi...

Please sign up or login with your details

Forgot password? Click here to reset