AV-NeRF: Learning Neural Fields for Real-World Audio-Visual Scene Synthesis

02/04/2023
by   Susan Liang, et al.
0

Human perception of the complex world relies on a comprehensive analysis of multi-modal signals, and the co-occurrences of audio and video signals provide humans with rich cues. This paper focuses on novel audio-visual scene synthesis in the real world. Given a video recording of an audio-visual scene, the task is to synthesize new videos with spatial audios along arbitrary novel camera trajectories in that audio-visual scene. Directly using a NeRF-based model for audio synthesis is insufficient due to its lack of prior knowledge and acoustic supervision. To tackle the challenges, we first propose an acoustic-aware audio generation module that integrates our prior knowledge of audio propagation into NeRF, in which we associate audio generation with the 3D geometry of the visual environment. In addition, we propose a coordinate transformation module that expresses a viewing direction relative to the sound source. Such a direction transformation helps the model learn sound source-centric acoustic fields. Moreover, we utilize a head-related impulse response function to synthesize pseudo binaural audio for data augmentation that strengthens training. We qualitatively and quantitatively demonstrate the advantage of our model on real-world audio-visual scenes. We refer interested readers to view our video results for convincing comparisons.

READ FULL TEXT

page 4

page 6

page 7

research
01/20/2023

Novel-View Acoustic Synthesis

We introduce the novel-view acoustic synthesis (NVAS) task: given the si...
research
10/26/2022

Deep Learning Based Audio-Visual Multi-Speaker DOA Estimation Using Permutation-Free Loss Function

In this paper, we propose a deep learning based multi-speaker direction ...
research
10/05/2021

Manifold learning-supported estimation of relative transfer functions for spatial filtering

Many spatial filtering algorithms used for voice capture in, e.g., telec...
research
04/04/2022

Learning Neural Acoustic Fields

Our environment is filled with rich and dynamic acoustic information. Wh...
research
01/04/2023

Chat2Map: Efficient Scene Mapping from Multi-Ego Conversations

Can conversational videos captured from multiple egocentric viewpoints r...
research
07/27/2023

Self-Supervised Visual Acoustic Matching

Acoustic matching aims to re-synthesize an audio clip to sound as if it ...
research
03/30/2019

Static Visual Spatial Priors for DoA Estimation

As we interact with the world, for example when we communicate with our ...

Please sign up or login with your details

Forgot password? Click here to reset