Audio Deepfake Detection Based on a Combination of F0 Information and Real Plus Imaginary Spectrogram Features

08/02/2022
by   Jun Xue, et al.
0

Recently, pioneer research works have proposed a large number of acoustic features (log power spectrogram, linear frequency cepstral coefficients, constant Q cepstral coefficients, etc.) for audio deepfake detection, obtaining good performance, and showing that different subbands have different contributions to audio deepfake detection. However, this lacks an explanation of the specific information in the subband, and these features also lose information such as phase. Inspired by the mechanism of synthetic speech, the fundamental frequency (F0) information is used to improve the quality of synthetic speech, while the F0 of synthetic speech is still too average, which differs significantly from that of real speech. It is expected that F0 can be used as important information to discriminate between bonafide and fake speech, while this information cannot be used directly due to the irregular distribution of F0. Insteadly, the frequency band containing most of F0 is selected as the input feature. Meanwhile, to make full use of the phase and full-band information, we also propose to use real and imaginary spectrogram features as complementary input features and model the disjoint subbands separately. Finally, the results of F0, real and imaginary spectrogram features are fused. Experimental results on the ASVspoof 2019 LA dataset show that our proposed system is very effective for the audio deepfake detection task, achieving an equivalent error rate (EER) of 0.43 systems.

READ FULL TEXT
research
08/19/2023

Spatial Reconstructed Local Attention Res2Net with F0 Subband for Fake Speech Detection

The rhythm of synthetic speech is usually too smooth, which causes that ...
research
10/21/2022

Adaptive re-calibration of channel-wise features for Adversarial Audio Classification

DeepFake Audio, unlike DeepFake images and videos, has been relatively l...
research
09/23/2019

Speech Replay Detection with x-Vector Attack Embeddings and Spectral Features

We present our system submission to the ASVspoof 2019 Challenge Physical...
research
10/06/2022

The Sound of Silence: Efficiency of First Digit Features in Synthetic Audio Detection

The recent integration of generative neural strategies and audio process...
research
03/02/2020

Identification of primary and collateral tracks in stuttered speech

Disfluent speech has been previously addressed from two main perspective...
research
05/23/2023

Detection of Cross-Dataset Fake Audio Based on Prosodic and Pronunciation Features

Existing fake audio detection systems perform well in in-domain testing,...
research
10/06/2021

An Investigation of the Effectiveness of Phase for Audio Classification

While log-amplitude mel-spectrogram has widely been used as the feature ...

Please sign up or login with your details

Forgot password? Click here to reset