End-to-End Spectro-Temporal Graph Attention Networks for Speaker Verification Anti-Spoofing and Speech Deepfake Detection

07/27/2021
by   Hemlata Tak, et al.
0

Artefacts that serve to distinguish bona fide speech from spoofed or deepfake speech are known to reside in specific subbands and temporal segments. Various approaches can be used to capture and model such artefacts, however, none works well across a spectrum of diverse spoofing attacks. Reliable detection then often depends upon the fusion of multiple detection systems, each tuned to detect different forms of attack. In this paper we show that better performance can be achieved when the fusion is performed within the model itself and when the representation is learned automatically from raw waveform inputs. The principal contribution is a spectro-temporal graph attention network (GAT) which learns the relationship between cues spanning different sub-bands and temporal intervals. Using a model-level graph fusion of spectral (S) and temporal (T) sub-graphs and a graph pooling strategy to improve discrimination, the proposed RawGAT-ST model achieves an equal error rate of 1.06 ASVspoof 2019 logical access database. This is one of the best results reported to date and is reproducible using an open source implementation.

READ FULL TEXT
research
04/08/2021

Graph Attention Networks for Anti-Spoofing

The cues needed to detect spoofing attacks against automatic speaker ver...
research
09/14/2022

ConvNext Based Neural Network for Anti-Spoofing

Automatic speaker verification (ASV) has been widely used in the real li...
research
10/04/2021

AASIST: Audio Anti-Spoofing using Integrated Spectro-Temporal Graph Attention Networks

Artefacts that differentiate spoofed from bona-fide utterances can resid...
research
04/14/2020

An explainability study of the constant Q cepstral coefficient spoofing countermeasure for automatic speaker verification

Anti-spoofing for automatic speaker verification is now a well establish...
research
11/08/2021

RawBoost: A Raw Data Boosting and Augmentation Method applied to Automatic Speaker Verification Anti-Spoofing

This paper introduces RawBoost, a data boosting and augmentation method ...
research
07/26/2020

End-to-end spoofing detection with raw waveform CLDNNs

Albeit recent progress in speaker verification generates powerful models...
research
09/15/2023

Improving Short Utterance Anti-Spoofing with AASIST2

The wav2vec 2.0 and integrated spectro-temporal graph attention network ...

Please sign up or login with your details

Forgot password? Click here to reset