Parkour is a grand challenge for legged locomotion that requires robots ...
High-Definition (HD) maps are essential for the safety of autonomous dri...
Millimeter wave (mmWave) based speech recognition provides more possibil...
The growing demand for accurate control in varying and unknown environme...
The Video-to-Audio (V2A) model has recently gained attention for its
pra...
Depth estimation is a cornerstone of perception in autonomous driving an...
Large language models (LLMs) with memory are computationally universal.
...
This paper tries to address a fundamental question in point cloud
self-s...
We abstract the features (i.e. learned representations) of multi-modal d...
Robotic perception requires the modeling of both 3D geometry and semanti...
Inferring past human motion from RGB images is challenging due to the
in...
High-definition (HD) semantic maps are crucial for autonomous vehicles
n...
High-resolution images enable neural networks to learn richer visual
rep...
The weakly supervised instance segmentation is a challenging task. The
e...
We study the problem of learning online packing skills for irregular 3D
...
Visual object tracking is an essential capability of intelligent robots....
Motion prediction is crucial in enabling safe motion planning for autono...
Interactive traffic simulation is crucial to autonomous driving systems ...
Predicting future behaviors of road agents is a key task in autonomous
d...
Existing autonomous driving pipelines separate the perception module fro...
Some recent studies have demonstrated the feasibility of single-stage ne...
This paper focuses on perceiving and navigating 3D environments using ec...
Considering the microphone is easily affected by noise and soundproof
ma...
In this paper, a novel data-driven approach named Augmented Imageficatio...
Autonomous driving systems require a good understanding of surrounding
e...
Multimodal knowledge distillation (KD) extends traditional knowledge
dis...
Estimating the distance of objects is a safety-critical task for autonom...
Monocular image-based 3D perception has become an active research area i...
From the patter of rain to the crunch of snow, the sounds we hear often
...
Synthesizer is a type of electronic musical instrument that is now widel...
Accurate and consistent 3D tracking from multiple cameras is a key compo...
We are interested in anticipating as early as possible the target locati...
Inspired by the success of self-supervised autoregressive representation...
Sensor fusion is an essential topic in many perception systems, such as
...
Predicting future motions of road participants is an important task for
...
In this paper, we propose S3T, a self-supervised pre-training method wit...
Object detection through either RGB images or the LiDAR point clouds has...
In LiDAR-based 3D object detection for autonomous driving, the ratio of ...
Building embodied intelligent agents that can interact with 3D indoor
en...
Pre-training has become a standard paradigm in many computer vision task...
Dubbing is a post-production process of re-recording actors' dialogues, ...
We introduce a framework for multi-camera 3D object detection. In contra...
We tackle the Online 3D Bin Packing Problem, a challenging yet practical...
Due to the stochasticity of human behaviors, predicting the future
traje...
High-definition map (HD map) construction is a crucial problem for auton...
High Definition (HD) maps are maps with precise definitions of road lane...
In autonomous driving, goal-based multi-trajectory prediction methods ar...
Transformers recently are adapted from the community of natural language...
Learning multi-modal representations is an essential step towards real-w...
The world provides us with data of multiple modalities. Intuitively, mod...