Compositional Human-Scene Interaction Synthesis with Semantic Control

07/26/2022
by   Kaifeng Zhao, et al.
0

Synthesizing natural interactions between virtual humans and their 3D environments is critical for numerous applications, such as computer games and AR/VR experiences. Our goal is to synthesize humans interacting with a given 3D scene controlled by high-level semantic specifications as pairs of action categories and object instances, e.g., "sit on the chair". The key challenge of incorporating interaction semantics into the generation framework is to learn a joint representation that effectively captures heterogeneous information, including human body articulation, 3D object geometry, and the intent of the interaction. To address this challenge, we design a novel transformer-based generative model, in which the articulated 3D human body surface points and 3D objects are jointly encoded in a unified latent space, and the semantics of the interaction between the human and objects are embedded via positional encoding. Furthermore, inspired by the compositional nature of interactions that humans can simultaneously interact with multiple objects, we define interaction semantics as the composition of varying numbers of atomic action-object pairs. Our proposed generative model can naturally incorporate varying numbers of atomic interactions, which enables synthesizing compositional human-scene interactions without requiring composite interaction data. We extend the PROX dataset with interaction semantic labels and scene instance segmentation to evaluate our method and demonstrate that our method can generate realistic human-scene interactions with semantic control. Our perceptual study shows that our synthesized virtual humans can naturally interact with 3D scenes, considerably outperforming existing methods. We name our method COINS, for COmpositional INteraction Synthesis with Semantic Control. Code and data are available at https://github.com/zkf1997/COINS.

READ FULL TEXT

page 2

page 13

page 23

page 26

page 29

page 30

page 31

page 32

research
08/12/2020

Generating Person-Scene Interactions in 3D Scenes

High fidelity digital 3D environments have been proposed in recent years...
research
04/27/2023

Compositional 3D Human-Object Neural Animation

Human-object interactions (HOIs) are crucial for human-centric scene und...
research
05/01/2022

COUCH: Towards Controllable Human-Chair Interactions

Humans interact with an object in many different ways by making contact ...
research
03/16/2023

Narrator: Towards Natural Control of Human-Scene Interaction Generation via Relationship Reasoning

Naturally controllable human-scene interaction (HSI) generation has an i...
research
05/23/2023

NCHO: Unsupervised Learning for Neural 3D Composition of Humans and Objects

Deep generative models have been recently extended to synthesizing 3D di...
research
09/14/2023

Unified Human-Scene Interaction via Prompted Chain-of-Contacts

Human-Scene Interaction (HSI) is a vital component of fields like embodi...
research
05/27/2023

Self-Supervised Learning of Action Affordances as Interaction Modes

When humans perform a task with an articulated object, they interact wit...

Please sign up or login with your details

Forgot password? Click here to reset