TalkNCE: Improving Active Speaker Detection with Talk-Aware Contrastive Learning

09/21/2023
by   Chaeyoung Jung, et al.
0

The goal of this work is Active Speaker Detection (ASD), a task to determine whether a person is speaking or not in a series of video frames. Previous works have dealt with the task by exploring network architectures while learning effective representations has been less explored. In this work, we propose TalkNCE, a novel talk-aware contrastive loss. The loss is only applied to part of the full segments where a person on the screen is actually speaking. This encourages the model to learn effective representations through the natural correspondence of speech and facial movements. Our loss can be jointly optimized with the existing objectives for training ASD models without the need for additional supervision or training data. The experiments demonstrate that our loss can be easily integrated into the existing ASD frameworks, improving their performance. Our method achieves state-of-the-art performances on AVA-ActiveSpeaker and ASW datasets.

READ FULL TEXT
research
09/24/2022

Unsupervised active speaker detection in media content using cross-modal information

We present a cross-modal unsupervised framework for active speaker detec...
research
03/29/2016

Cross-modal Supervision for Learning Active Speaker Detection in Video

In this paper, we show how to use audio to supervise the learning of act...
research
02/22/2022

Contrastive-mixup learning for improved speaker verification

This paper proposes a novel formulation of prototypical loss with mixup ...
research
09/13/2021

Improving Robustness and Efficiency in Active Learning with Contrastive Loss

This paper introduces supervised contrastive active learning (SCAL) by l...
research
07/23/2020

Augmentation adversarial training for unsupervised speaker recognition

The goal of this work is to train robust speaker recognition models with...
research
02/11/2020

Phoneme Boundary Detection using Learnable Segmental Features

Phoneme boundary detection plays an essential first step for a variety o...

Please sign up or login with your details

Forgot password? Click here to reset