Vision Transformers (ViTs) emerge to achieve impressive performance on m...
Generalized few-shot object detection aims to achieve precise detection ...
In Video Question Answering (VideoQA), answering general questions about...
Multi-channel video-language retrieval require models to understand
info...
We study multimodal few-shot object detection (FSOD) in this paper, usin...
Few-shot object detection (FSOD), with the aim to detect novel objects u...
Few-shot object detection (FSOD) aims to detect never-seen objects using...
Few-shot object detection (FSOD) aims to detect objects using only few
e...
Recent works seek to endow recognition systems with the ability to handl...
Two-stream networks have achieved great success in video recognition. A
...
Two-stream networks have achieved great success in video recognition. A
...