This paper presents a controllable text-to-video (T2V) diffusion model, ...
In this paper, we study video synthesis with emphasis on simplifying the...
As a promising solution of reducing annotation cost, training multi-labe...
Recently many multi-label image recognition (MLR) works have made signif...
Training the multi-label image recognition models with partial labels, i...
Multi-label image recognition is a fundamental yet practical task becaus...
Recognizing human emotion/expressions automatically is quite an expected...
Crowd counting is a fundamental yet challenging problem, which desires r...
Recognizing multiple labels of an image is a practical yet challenging t...
In this work, we investigate an Active Object Search (AOS) task that is ...
Data inconsistency and bias are inevitable among different facial expres...
Significant progress has been made in recent years in image captioning, ...
Crowd counting is an application-oriented task and its inference efficie...
Due to the widespread applications in real-world scenarios, metro riders...
Few-shot learning aims to learn novel categories from very few samples g...
Recognizing multiple labels of images is a practical and challenging tas...
Multi-Person Tracking (MPT) is often addressed within the
detection-to-a...
We propose an attention-injective deformable convolutional network calle...
In this paper, we aim at tackling the problem of crowd counting in extre...
Fabric image retrieval is beneficial to many applications including clot...