R2-Diff: Denoising by diffusion as a refinement of retrieved motion for image-based motion prediction

by   Takeru Oba, et al.

Image-based motion prediction is one of the essential techniques for robot manipulation. Among the various prediction models, we focus on diffusion models because they have achieved state-of-the-art performance in various applications. In image-based motion prediction, diffusion models stochastically predict contextually appropriate motion by gradually denoising random Gaussian noise based on the image context. While diffusion models are able to predict various motions by changing the random noise, they sometimes fail to predict a contextually appropriate motion based on the image because the random noise is sampled independently of the image context. To solve this problem, we propose R2-Diff. In R2-Diff, a motion retrieved from a dataset based on image similarity is fed into a diffusion model instead of random noise. Then, the retrieved motion is refined through the denoising process of the diffusion model. Since the retrieved motion is almost appropriate to the context, it becomes easier to predict contextually appropriate motion. However, traditional diffusion models are not optimized to refine the retrieved motion. Therefore, we propose the method of tuning the hyperparameters based on the distance of the nearest neighbor motion among the dataset to optimize the diffusion model for refinement. Furthermore, we propose an image-based retrieval method to retrieve the nearest neighbor motion in inference. Our proposed retrieval efficiently computes the similarity based on the image features along the motion trajectory. We demonstrate that R2-Diff accurately predicts appropriate motions and achieves high task success rates compared to recent state-of-the-art models in robot manipulation.


page 2

page 5

page 7

page 12

page 15

page 19


Can We Use Diffusion Probabilistic Models for 3D Motion Prediction?

After many researchers observed fruitfulness from the recent diffusion p...

FLAME: Free-form Language-based Motion Synthesis Editing

Text-based motion generation models are drawing a surge of interest for ...

Motion Planning Diffusion: Learning and Planning of Robot Motions with Diffusion Models

Learning priors on trajectory distributions can help accelerate robot mo...

Controllable Motion Diffusion Model

Generating realistic and controllable motions for virtual characters is ...

Motion Similarity Modeling – A State of the Art Report

The analysis of human motion opens up a wide range of possibilities, suc...

HumanMAC: Masked Motion Completion for Human Motion Prediction

Human motion prediction is a classical problem in computer vision and co...

Explicit Diffusion of Gaussian Mixture Model Based Image Priors

In this work we tackle the problem of estimating the density f_X of a ra...

Please sign up or login with your details

Forgot password? Click here to reset