As it is empirically observed that Vision Transformers (ViTs) are quite
...
Contrastive Language-Image Pre-training (CLIP) models have shown promisi...
LiDAR-based fully sparse architecture has garnered increasing attention....
The Class Incremental Semantic Segmentation (CISS) extends the tradition...
Weakly-supervised temporal action localization (WTAL) is a practical yet...
Anomaly detection (AD) is a fundamental research problem in machine lear...
The framework of visually-guided sound source separation generally consi...
Comprehensive modeling of the surrounding 3D world is key to the success...
With the advent of the big model era, the demand for data has become mor...
Data association is a knotty problem for 2D Multiple Object Tracking due...
The crux of label-efficient semantic segmentation is to produce high-qua...
Computer end users have spent billions of hours completing daily tasks l...
The captivating realm of Minecraft has attracted substantial research
in...
Domain adaptive semantic segmentation aims to transfer knowledge from a
...
Data and model are the undoubtable two supporting pillars for LiDAR obje...
This paper aims for high-performance offline LiDAR-based 3D object detec...
Masked image modeling (MIM) has attracted much research attention due to...
We explore long-term temporal visual correspondence-based optimization f...
Data association is at the core of many computer vision tasks, e.g., mul...
The function of constructing the hierarchy of objects is important to th...
Few-Shot transfer learning has become a major focus of research as it al...
It is widely agreed that reference-based super-resolution (RefSR) achiev...
The ability to discover abstract physical concepts and understand how th...
The transformation of features from 2D perspective space to 3D space is
...
As the perception range of LiDAR expands, LiDAR-based 3D object detectio...
Recently, unsupervised learning has made impressive progress on various
...
We present a novel bird's-eye-view (BEV) detector with perspective
super...
Reference-based image super-resolution (RefSR) is a promising SR branch ...
In this paper, we propose a new approach to applying point-level annotat...
Object discovery is a core task in computer vision. While fast progresse...
Image-goal navigation is a challenging task, as it requires the agent to...
In computer vision, fine-tuning is the de-facto approach to leverage
pre...
Estimating accurate 3D locations of objects from monocular images is a
c...
As the perception range of LiDAR increases, LiDAR-based 3D object detect...
Most existing unsupervised person re-identification (Re-ID) methods use
...
Sound source localization in visual scenes aims to localize objects emit...
In this paper, we propose a conceptually novel, efficient, and fully
con...
Capsule networks are designed to present the objects by a set of parts a...
The paradigm of training models on massive data without label through
se...
Learned image compression methods have exhibited superior rate-distortio...
Representation is a core issue in artificial intelligence. Humans use
di...
In LiDAR-based 3D object detection for autonomous driving, the ratio of ...
Previous online 3D Multi-Object Tracking(3DMOT) methods terminate a trac...
Benefited from considerable pixel-level annotations collected from a spe...
Transfer learning with pre-training on large-scale datasets has played a...
Inpainting arbitrary missing regions is challenging because learning val...
A practical long-term tracker typically contains three key properties, i...
The two-stage methods for instance segmentation, e.g. Mask R-CNN, have
a...
Tremendous efforts have been made on instance segmentation but the mask
...
In this paper, we are interested in the bottom-up paradigm of estimating...