Leveraging Real Talking Faces via Self-Supervision for Robust Forgery Detection

01/18/2022
by   Alexandros Haliassos, et al.
3

One of the most pressing challenges for the detection of face-manipulated videos is generalising to forgery methods not seen during training while remaining effective under common corruptions such as compression. In this paper, we question whether we can tackle this issue by harnessing videos of real talking faces, which contain rich information on natural facial appearance and behaviour and are readily available in large quantities online. Our method, termed RealForensics, consists of two stages. First, we exploit the natural correspondence between the visual and auditory modalities in real videos to learn, in a self-supervised cross-modal manner, temporally dense video representations that capture factors such as facial movements, expression, and identity. Second, we use these learned representations as targets to be predicted by our forgery detector along with the usual binary forgery classification task; this encourages it to base its real/fake decision on said factors. We show that our method achieves state-of-the-art performance on cross-manipulation generalisation and robustness experiments, and examine the factors that contribute to its performance. Our results suggest that leveraging natural and unlabelled videos is a promising direction for the development of more robust face forgery detectors.

READ FULL TEXT

page 2

page 14

page 17

research
12/01/2022

FakeOut: Leveraging Out-of-domain Self-supervision for Multi-modal Video Deepfake Detection

Video synthesis methods rapidly improved in recent years, allowing easy ...
research
12/14/2020

Lips Don't Lie: A Generalisable and Robust Approach to Face Forgery Detection

Although current deep learning-based face forgery detectors achieve impr...
research
02/20/2020

Disentangled Speech Embeddings using Cross-modal Self-supervision

The objective of this paper is to learn representations of speaker ident...
research
11/19/2020

Watch and Learn: Mapping Language and Noisy Real-world Videos with Self-supervision

In this paper, we teach machines to understand visuals and natural langu...
research
10/20/2021

Look at What I'm Doing: Self-Supervised Spatial Grounding of Narrations in Instructional Videos

We introduce the task of spatially localizing narrated interactions in v...
research
04/28/2021

Robust Face-Swap Detection Based on 3D Facial Shape Information

Maliciously-manipulated images or videos - so-called deep fakes - especi...
research
12/04/2020

ID-Reveal: Identity-aware DeepFake Video Detection

State-of-the-art DeepFake forgery detectors are trained in a supervised ...

Please sign up or login with your details

Forgot password? Click here to reset