T2M-GPT: Generating Human Motion from Textual Descriptions with Discrete Representations

01/15/2023
by   Jianrong Zhang, et al.
0

In this work, we investigate a simple and must-known conditional generative framework based on Vector Quantised-Variational AutoEncoder (VQ-VAE) and Generative Pre-trained Transformer (GPT) for human motion generation from textural descriptions. We show that a simple CNN-based VQ-VAE with commonly used training recipes (EMA and Code Reset) allows us to obtain high-quality discrete representations. For GPT, we incorporate a simple corruption strategy during the training to alleviate training-testing discrepancy. Despite its simplicity, our T2M-GPT shows better performance than competitive approaches, including recent diffusion-based approaches. For example, on HumanML3D, which is currently the largest dataset, we achieve comparable performance on the consistency between text and generated motion (R-Precision), but with FID 0.116 largely outperforming MotionDiffuse of 0.630. Additionally, we conduct analyses on HumanML3D and observe that the dataset size is a limitation of our approach. Our work suggests that VQ-VAE still remains a competitive approach for human motion generation.

READ FULL TEXT
research
04/25/2022

TEMOS: Generating diverse human motions from textual descriptions

We address the problem of generating diverse 3D human motions from textu...
research
12/03/2021

Global Context with Discrete Diffusion in Vector Quantised Modelling for Image Generation

The integration of Vector Quantised Variational AutoEncoder (VQ-VAE) wit...
research
07/12/2023

Reconstructing Spatiotemporal Data with C-VAEs

The continuous representation of spatiotemporal data commonly relies on ...
research
04/20/2021

VideoGPT: Video Generation using VQ-VAE and Transformers

We present VideoGPT: a conceptually simple architecture for scaling like...
research
08/28/2023

Priority-Centric Human Motion Generation in Discrete Latent Space

Text-to-motion generation is a formidable task, aiming to produce human ...
research
11/24/2021

Hierarchical Graph-Convolutional Variational AutoEncoding for Generative Modelling of Human Motion

Models of human motion commonly focus either on trajectory prediction or...

Please sign up or login with your details

Forgot password? Click here to reset