Incorporating external knowledge into dialogue generation (KIDG) is cruc...
Recently, large-scale pre-trained language-image models like CLIP have s...
Visual tracking has made significant improvements in the past few decade...
Relational Language-Image Pre-training (RLIP) aims to align vision
repre...
This paper introduces ModelScopeT2V, a text-to-video synthesis model tha...
Spatial convolutions are extensively used in numerous deep video models....
The pursuit of controllability as a higher standard of visual content
cr...
Current state-of-the-art approaches for few-shot action recognition achi...
Learning from large-scale contrastive language-image pre-training like C...
We present Rhino, a system for accelerating tensor programs with automat...
This paper presents TAG, an automatic system to derive optimized DNN tra...
Recent incremental learning for action recognition usually stores
repres...
This paper proposes DisCo, an automatic deep learning compilation module...
Prompt-based fine-tuning for pre-trained models has proven effective for...
Standard approaches for video recognition usually operate on the full in...
Current methods for LIDAR semantic segmentation are not robust enough fo...
This technical report presents our first place winning solution for temp...
Temporal contexts among consecutive frames are far from being fully util...
Fine-tuning pre-trained language models for downstream tasks has become ...
Spatial convolutions are widely used in numerous deep video models. It
f...
Current approaches for video grounding propose kinds of complex architec...
The central idea of contrastive learning is to discriminate between diff...
Temporal action localization aims to localize starting and ending time w...
Most recent approaches for online action detection tend to apply Recurre...
Weakly-Supervised Temporal Action Localization (WS-TAL) task aims to
rec...
This technical report presents our solution for temporal action detectio...
This paper presents our solution to the AVA-Kinetics Crossover Challenge...
This technical report analyzes an egocentric video action detection meth...
With the recent surge in the research of vision transformers, they have
...
Self-supervised learning presents a remarkable performance to utilize
un...
Currently, one-stage frameworks have been widely applied for temporal ac...
Promptly and accurately answering questions on products is important for...
In this report, we present our solution for the task of temporal action
...
This technical report analyzes a temporal action localization method we ...
Link failures and cable miswirings are not uncommon in building data cen...
Existing traffic engineering (TE) solutions performs well for software
d...
Current state-of-the-art approaches for spatio-temporal action detection...