We study the task of zero-shot vision-and-language navigation (ZS-VLN), ...
Vision-and-language navigation (VLN) requires an embodied agent to navig...
Zero-shot quantization (ZSQ) is promising for compressing and accelerati...
Open World Object Detection (OWOD) is a novel computer vision task with ...
As a crucial infrastructure of intelligent mobile robots, LiDAR-Inertial...
Image-text pretrained models, e.g., CLIP, have shown impressive general
...
Open-world object detection (OWOD), as a more general and challenging go...
We address a practical yet challenging problem of training robot agents ...
Getting robots to navigate to multiple objects autonomously is essential...
We study self-supervised video representation learning that seeks to lea...
We present two versatile methods to generally enhance self-supervised
mo...
Self-supervised monocular depth estimation (MDE) models universally suff...
Point cloud compression plays a crucial role in reducing the huge cost o...
We deal with the controllable person image synthesis task which aims to
...
Generating portrait images by controlling the motions of existing faces ...
Pose-guided person image synthesis aims to synthesize person images by
t...
Pose-guided person image generation and animation aim to transform a sou...
Pose-guided person image generation is to transform a source person imag...
Image inpainting techniques have shown significant improvements by using...
Point cloud is a fundamental 3D representation which is widely used in r...
Video anomaly detection under weak labels is formulated as a typical
mul...
Weakly supervised temporal action detection is a Herculean task in
under...
Saliency detection aims to detect the most attractive objects in images ...
Fully convolutional neural networks (FCNs) have shown outstanding perfor...