TransFusion: A Practical and Effective Transformer-based Diffusion Model for 3D Human Motion Prediction

by   Sibo Tian, et al.

Predicting human motion plays a crucial role in ensuring a safe and effective human-robot close collaboration in intelligent remanufacturing systems of the future. Existing works can be categorized into two groups: those focusing on accuracy, predicting a single future motion, and those generating diverse predictions based on observations. The former group fails to address the uncertainty and multi-modal nature of human motion, while the latter group often produces motion sequences that deviate too far from the ground truth or become unrealistic within historical contexts. To tackle these issues, we propose TransFusion, an innovative and practical diffusion-based model for 3D human motion prediction which can generate samples that are more likely to happen while maintaining a certain level of diversity. Our model leverages Transformer as the backbone with long skip connections between shallow and deep layers. Additionally, we employ the discrete cosine transform to model motion sequences in the frequency space, thereby improving performance. In contrast to prior diffusion-based models that utilize extra modules like cross-attention and adaptive layer normalization to condition the prediction on past observed motion, we treat all inputs, including conditions, as tokens to create a more lightweight model compared to existing approaches. Extensive experimental studies are conducted on benchmark datasets to validate the effectiveness of our human motion prediction model.


page 1

page 5


Towards Globally Consistent Stochastic Human Motion Prediction via Motion Diffusion

Stochastic human motion prediction aims to predict multiple possible upc...

Learning to Predict Diverse Human Motions from a Single Image via Mixture Density Networks

Human motion prediction, which plays a key role in computer vision, gene...

Can We Use Diffusion Probabilistic Models for 3D Motion Prediction?

After many researchers observed fruitfulness from the recent diffusion p...

DeFeeNet: Consecutive 3D Human Motion Prediction with Deviation Feedback

Let us rethink the real-world scenarios that require human motion predic...

Robust Human Motion Forecasting using Transformer-based Model

Comprehending human motion is a fundamental challenge for developing Hum...

TAMFormer: Multi-Modal Transformer with Learned Attention Mask for Early Intent Prediction

Human intention prediction is a growing area of research where an activi...

MCM: Multi-condition Motion Synthesis Framework for Multi-scenario

The objective of the multi-condition human motion synthesis task is to i...

Please sign up or login with your details

Forgot password? Click here to reset