In this paper, we for the first time explore helpful multi-modal context...
We introduce MQ-Det, an efficient architecture and pre-training strategy...
Visual Grounding (VG) refers to locating a region described by expressio...
Although significant progress has been made in few-shot learning, most o...
Visual grounding focuses on establishing fine-grained alignment between
...
Graph convolutional networks (GCNs) based methods have achieved advanced...
Recently, the transductive graph-based methods have achieved great succe...
Health management is getting increasing attention all over the world.
Ho...
Image restoration methods are commonly used to improve the quality of
as...
Temporal action localization has recently attracted significant interest...