Visual Sound Localization in the Wild by Cross-Modal Interference Erasing

02/13/2022
by   Xian Liu, et al.
5

The task of audio-visual sound source localization has been well studied under constrained scenes, where the audio recordings are clean. However, in real-world scenarios, audios are usually contaminated by off-screen sound and background noise. They will interfere with the procedure of identifying desired sources and building visual-sound connections, making previous studies non-applicable. In this work, we propose the Interference Eraser (IEr) framework, which tackles the problem of audio-visual sound source localization in the wild. The key idea is to eliminate the interference by redefining and carving discriminative audio representations. Specifically, we observe that the previous practice of learning only a single audio representation is insufficient due to the additive nature of audio signals. We thus extend the audio representation with our Audio-Instance-Identifier module, which clearly distinguishes sounding instances when audio signals of different volumes are unevenly mixed. Then we erase the influence of the audible but off-screen sounds and the silent but visible objects by a Cross-modal Referrer module with cross-modality distillation. Quantitative and qualitative evaluations demonstrate that our proposed framework achieves superior results on sound localization tasks, especially under real-world scenarios. Code is available at https://github.com/alvinliu0/Visual-Sound-Localization-in-the-Wild.

READ FULL TEXT

page 1

page 4

page 7

page 15

page 16

research
09/19/2023

Sound Source Localization is All about Cross-Modal Alignment

Humans can easily perceive the direction of sound sources in a visual sc...
research
08/09/2023

Induction Network: Audio-Visual Modality Gap-Bridging for Self-Supervised Sound Source Localization

Self-supervised sound source localization is usually challenged by the m...
research
03/23/2023

Egocentric Audio-Visual Object Localization

Humans naturally perceive surrounding scenes by unifying sound and sight...
research
04/04/2022

Dual Quaternion Ambisonics Array for Six-Degree-of-Freedom Acoustic Representation

Spatial audio methods are gaining a growing interest due to the spread o...
research
09/05/2023

Generating Realistic Images from In-the-wild Sounds

Representing wild sounds as images is an important but challenging task ...
research
04/11/2022

How to Listen? Rethinking Visual Sound Localization

Localizing visual sounds consists on locating the position of objects th...
research
08/11/2023

Audio-Visual Spatial Integration and Recursive Attention for Robust Sound Source Localization

The objective of the sound source localization task is to enable machine...

Please sign up or login with your details

Forgot password? Click here to reset