Instructional videos are an important resource to learn procedural tasks...
Multiscale video transformers have been explored in a wide variety of vi...
There is limited understanding of the information captured by deep
spati...
This paper investigates the modeling of automated machine description on...
Intuition might suggest that motion and dynamic information are key to
v...
Deep spatiotemporal models are used in a variety of computer vision task...
In this paper, we study the problem of procedure planning in instruction...
Few-shot video object segmentation (FS-VOS) aims at segmenting video fra...
Video predictive understanding encompasses a wide range of efforts that ...
Early action recognition (action prediction) from limited preliminary
ob...
Animals locomote for various reasons: to search for food, find suitable
...
It has been repeatedly observed that convolutional architectures when ap...
This document will review the most prominent proposals using multilayer
...
As the success of deep models has led to their deployment in all areas o...
This paper presents a novel hierarchical spatiotemporal orientation
repr...
Two-stream Convolutional Networks (ConvNets) have shown strong performan...
In computer vision, action recognition refers to the act of classifying ...