Can we steal your vocal identity from the Internet?: Initial investigation of cloning Obama's voice using GAN, WaveNet and low-quality found data

03/02/2018
by   Jaime Lorenzo-Trueba, et al.
0

Thanks to the growing availability of spoofing databases and rapid advances in using them, systems for detecting voice spoofing attacks are becoming more and more capable, and error rates close to zero are being reached for the ASVspoof2015 database. However, speech synthesis and voice conversion paradigms that are not considered in the ASVspoof2015 database are appearing. Such examples include direct waveform modelling and generative adversarial networks. We also need to investigate the feasibility of training spoofing systems using only low-quality found data. For that purpose, we developed a generative adversarial network-based speech enhancement system that improves the quality of speech data found in publicly available sources. Using the enhanced data, we trained state-of-the-art text-to-speech and voice conversion models and evaluated them in terms of perceptual speech quality and speaker similarity. The results show that the enhancement models significantly improved the SNR of low-quality degraded data found in publicly available sources and that they significantly improved the perceptual cleanliness of the source speech without significantly degrading the naturalness of the voice. However, the results also show limitations when generating speech with the low-quality found data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/18/2022

Identifying Source Speakers for Voice Conversion based Spoofing Attacks on Speaker Verification Systems

An automatic speaker verification system aims to verify the speaker iden...
research
07/18/2023

SLMGAN: Exploiting Speech Language Model Representations for Unsupervised Zero-Shot Voice Conversion in GANs

In recent years, large-scale pre-trained speech language models (SLMs) h...
research
08/31/2018

Whispered-to-voiced Alaryngeal Speech Conversion with Generative Adversarial Networks

Most methods of voice restoration for patients suffering from aphonia ei...
research
07/09/2020

The Road Not Taken: Re-thinking the Feasibility of Voice Calling Over Tor

Anonymous VoIP calls over the Internet holds great significance for priv...
research
04/06/2019

Taco-VC: A Single Speaker Tacotron based Voice Conversion with Limited Data

This paper introduces Taco-VC, a novel architecture for voice conversion...
research
09/12/2018

Transforming acoustic characteristics to deceive playback spoofing countermeasures of speaker verification systems

Automatic speaker verification (ASV) systems use a playback detector to ...
research
06/09/2019

rVAD: An Unsupervised Segment-Based Robust Voice Activity Detection Method

This paper presents an unsupervised segment-based method for robust voic...

Please sign up or login with your details

Forgot password? Click here to reset