Multi-modal Motion Prediction with Transformer-based Neural Network for Autonomous Driving

by   Zhiyu Huang, et al.

Predicting the behaviors of other agents on the road is critical for autonomous driving to ensure safety and efficiency. However, the challenging part is how to represent the social interactions between agents and output different possible trajectories with interpretability. In this paper, we introduce a neural prediction framework based on the Transformer structure to model the relationship among the interacting agents and extract the attention of the target agent on the map waypoints. Specifically, we organize the interacting agents into a graph and utilize the multi-head attention Transformer encoder to extract the relations between them. To address the multi-modality of motion prediction, we propose a multi-modal attention Transformer encoder, which modifies the multi-head attention mechanism to multi-modal attention, and each predicted trajectory is conditioned on an independent attention mode. The proposed model is validated on the Argoverse motion forecasting dataset and shows state-of-the-art prediction accuracy while maintaining a small model size and a simple training process. We also demonstrate that the multi-modal attention module can automatically identify different modes of the target agent's attention on the map, which improves the interpretability of the model.


page 1

page 3


ReCoAt: A Deep Learning-based Framework for Multi-Modal Motion Prediction in Autonomous Driving Application

This paper proposes a novel deep learning framework for multi-modal moti...

Multi-modal Trajectory Prediction for Autonomous Driving with Semantic Map and Dynamic Graph Attention Network

Predicting future trajectories of surrounding obstacles is a crucial tas...

Multi-modal Transformer Path Prediction for Autonomous Vehicle

Reasoning about vehicle path prediction is an essential and challenging ...

CVAE-H: Conditionalizing Variational Autoencoders via Hypernetworks and Trajectory Forecasting for Autonomous Driving

The task of predicting stochastic behaviors of road agents in diverse en...

Towards Trustworthy Multi-Modal Motion Prediction: Evaluation and Interpretability

Predicting the motion of other road agents enables autonomous vehicles t...

NEMO: Future Object Localization Using Noisy Ego Priors

Predictive models for forecasting future behavior of road agents should ...

Spatial-Channel Transformer Network for Trajectory Prediction on the Traffic Scenes

Predicting motion of surrounding agents is critical to real-world applic...

Please sign up or login with your details

Forgot password? Click here to reset