Language-free Compositional Action Generation via Decoupling Refinement

07/07/2023
by   Xiao Liu, et al.
1

Composing simple elements into complex concepts is crucial yet challenging, especially for 3D action generation. Existing methods largely rely on extensive neural language annotations to discern composable latent semantics, a process that is often costly and labor-intensive. In this study, we introduce a novel framework to generate compositional actions without reliance on language auxiliaries. Our approach consists of three main components: Action Coupling, Conditional Action Generation, and Decoupling Refinement. Action Coupling utilizes an energy model to extract the attention masks of each sub-action, subsequently integrating two actions using these attentions to generate pseudo-training examples. Then, we employ a conditional generative model, CVAE, to learn a latent space, facilitating the diverse generation. Finally, we propose Decoupling Refinement, which leverages a self-supervised pre-trained model MAE to ensure semantic consistency between the sub-actions and compositional actions. This refinement process involves rendering generated 3D actions into 2D space, decoupling these images into two sub-segments, using the MAE model to restore the complete image from sub-segments, and constraining the recovered images to match images rendered from raw sub-actions. Due to the lack of existing datasets containing both sub-actions and compositional actions, we created two new datasets, named HumanAct-C and UESTC-C, and present a corresponding evaluation metric. Both qualitative and quantitative assessments are conducted to show our efficacy.

READ FULL TEXT
research
06/27/2020

Compositional Video Synthesis with Action Graphs

Videos of actions are complex spatio-temporal signals, containing rich c...
research
12/21/2019

Learning Diverse Stochastic Human-Action Generators by Learning Smooth Latent Transitions

Human-motion generation is a long-standing challenging task due to the r...
research
04/25/2023

Exploring Compositional Visual Generation with Latent Classifier Guidance

Diffusion probabilistic models have achieved enormous success in the fie...
research
08/19/2022

Hierarchical Compositional Representations for Few-shot Action Recognition

Recently action recognition has received more and more attention for its...
research
11/26/2020

Lifting 2D StyleGAN for 3D-Aware Face Generation

We propose a framework, called LiftedGAN, that disentangles and lifts a ...
research
12/18/2019

MALA: Cross-Domain Dialogue Generation with Action Learning

Response generation for task-oriented dialogues involves two basic compo...
research
08/28/2023

LAC: Latent Action Composition for Skeleton-based Action Segmentation

Skeleton-based action segmentation requires recognizing composable actio...

Please sign up or login with your details

Forgot password? Click here to reset