VSMask: Defending Against Voice Synthesis Attack via Real-Time Predictive Perturbation

by   Yuanda Wang, et al.

Deep learning based voice synthesis technology generates artificial human-like speeches, which has been used in deepfakes or identity theft attacks. Existing defense mechanisms inject subtle adversarial perturbations into the raw speech audios to mislead the voice synthesis models. However, optimizing the adversarial perturbation not only consumes substantial computation time, but it also requires the availability of entire speech. Therefore, they are not suitable for protecting live speech streams, such as voice messages or online meetings. In this paper, we propose VSMask, a real-time protection mechanism against voice synthesis attacks. Different from offline protection schemes, VSMask leverages a predictive neural network to forecast the most effective perturbation for the upcoming streaming speech. VSMask introduces a universal perturbation tailored for arbitrary speech input to shield a real-time speech in its entirety. To minimize the audio distortion within the protected speech, we implement a weight-based perturbation constraint to reduce the perceptibility of the added perturbation. We comprehensively evaluate VSMask protection performance under different scenarios. The experimental results indicate that VSMask can effectively defend against 3 popular voice synthesis models. None of the synthetic voice could deceive the speaker verification models or human ears with VSMask protection. In a physical world experiment, we demonstrate that VSMask successfully safeguards the real-time speech by injecting the perturbation over the air.


Stop Bugging Me! Evading Modern-Day Wiretapping Using Adversarial Perturbations

Mass surveillance systems for voice over IP (VoIP) conversations pose a ...

"Hello, It's Me": Deep Learning-based Speech Synthesis Attacks in the Real World

Advances in deep learning have introduced a new wave of voice synthesis ...

Training Strategies for Own Voice Reconstruction in Hearing Protection Devices using an In-ear Microphone

In-ear microphones in hearing protection devices can be utilized to capt...

Real-Time Neural Voice Camouflage

Automatic speech recognition systems have created exciting possibilities...

Real-time, Universal, and Robust Adversarial Attacks Against Speaker Recognition Systems

As the popularity of voice user interface (VUI) exploded in recent years...

Beyond Neural-on-Neural Approaches to Speaker Gender Protection

Recent research has proposed approaches that modify speech to defend aga...

Attack on practical speaker verification system using universal adversarial perturbations

In authentication scenarios, applications of practical speaker verificat...

Please sign up or login with your details

Forgot password? Click here to reset