Learning Multi-Object Dynamics with Compositional Neural Radiance Fields

02/24/2022
by   Danny Driess, et al.
0

We present a method to learn compositional predictive models from image observations based on implicit object encoders, Neural Radiance Fields (NeRFs), and graph neural networks. A central question in learning dynamic models from sensor observations is on which representations predictions should be performed. NeRFs have become a popular choice for representing scenes due to their strong 3D prior. However, most NeRF approaches are trained on a single scene, representing the whole scene with a global model, making generalization to novel scenes, containing different numbers of objects, challenging. Instead, we present a compositional, object-centric auto-encoder framework that maps multiple views of the scene to a set of latent vectors representing each object separately. The latent vectors parameterize individual NeRF models from which the scene can be reconstructed and rendered from novel viewpoints. We train a graph neural network dynamics model in the latent space to achieve compositionality for dynamics prediction. A key feature of our approach is that the learned 3D information of the scene through the NeRF model enables us to incorporate structural priors in learning the dynamics models, making long-term predictions more stable. The model can further be used to synthesize new scenes from individual object observations. For planning, we utilize RRTs in the learned latent space, where we can exploit our model and the implicit object encoder to make sampling the latent space informative and more efficient. In the experiments, we show that the model outperforms several baselines on a pushing task containing many objects. Video: https://dannydriess.github.io/compnerfdyn/

READ FULL TEXT

page 1

page 6

page 9

page 10

research
06/03/2022

Reinforcement Learning with Neural Radiance Fields

It is a long-standing problem to find effective representations for trai...
research
03/31/2020

Cross Scene Prediction via Modeling Dynamic Correlation using Latent Space Shared Auto-Encoders

This work addresses on the following problem: given a set of unsynchroni...
research
07/16/2021

Towards an Interpretable Latent Space in Structured Models for Video Prediction

We focus on the task of future frame prediction in video governed by und...
research
11/12/2020

3D-OES: Viewpoint-Invariant Object-Factorized Environment Simulators

We propose an action-conditioned dynamics model that predicts scene chan...
research
06/17/2020

CoSE: Compositional Stroke Embeddings

We present a generative model for stroke-based drawing tasks which is ab...
research
04/30/2023

Object-Centric Voxelization of Dynamic Scenes via Inverse Neural Rendering

Understanding the compositional dynamics of the world in unsupervised 3D...
research
03/23/2023

Learning and generalization of compositional representations of visual scenes

Complex visual scenes that are composed of multiple objects, each with a...

Please sign up or login with your details

Forgot password? Click here to reset