Zero-shot video recognition (ZSVR) is a task that aims to recognize vide...
We present a hardware-efficient architecture of convolutional neural net...
It is well believed that the higher uncertainty in a word of the caption...
Recently, vector quantized autoregressive (VQ-AR) models have shown
rema...
Ensemble of machine learning models yields improved performance as well ...
Over past few years afterward the birth of ResNet, skip connection has b...
For years, the YOLO series has been the de facto industry-level standard...
Panoptic Narrative Grounding (PNG) is an emerging task whose goal is to
...
Existing approaches to image captioning usually generate the sentence
wo...
Referring video object segmentation aims to predict foreground labels fo...
Recently, lane detection has made great progress with the rapid developm...
BiSeNet has been proved to be a popular two-stream network for real-time...
Food recognition plays an important role in food choice and intake, whic...
Food recognition has received more and more attention in the multimedia
...
Facial landmark localization is a very crucial step in numerous face rel...