Transformed ROIs for Capturing Visual Transformations in Videos

06/06/2021
by   Abhinav Rai, et al.
5

Modeling the visual changes that an action brings to a scene is critical for video understanding. Currently, CNNs process one local neighbourhood at a time, so contextual relationships over longer ranges, while still learnable, are indirect. We present TROI, a plug-and-play module for CNNs to reason between mid-level feature representations that are otherwise separated in space and time. The module relates localized visual entities such as hands and interacting objects and transforms their corresponding regions of interest directly in the feature maps of convolutional layers. With TROI, we achieve state-of-the-art action recognition results on the large-scale datasets Something-Something-V2 and Epic-Kitchens-100.

READ FULL TEXT

page 1

page 3

page 7

page 12

page 13

page 14

research
12/13/2018

Dynamic Graph Modules for Modeling Higher-Order Interactions in Activity Recognition

Video action recognition, as a critical problem towards video understand...
research
05/29/2014

Feature sampling and partitioning for visual vocabulary generation on large action classification datasets

The recent trend in action recognition is towards larger datasets, an in...
research
04/02/2020

Knowing What, Where and When to Look: Efficient Video Action Modeling with Attention

Attentive video modeling is essential for action recognition in unconstr...
research
12/01/2021

PreViTS: Contrastive Pretraining with Video Tracking Supervision

Videos are a rich source for self-supervised learning (SSL) of visual re...
research
07/18/2023

Human Action Recognition in Still Images Using ConViT

Understanding the relationship between different parts of the image play...
research
11/16/2017

Attend and Interact: Higher-Order Object Interactions for Video Understanding

Human actions often involve complex interactions across several inter-re...
research
02/01/2021

Forecasting Action through Contact Representations from First Person Video

Human actions involving hand manipulations are structured according to t...

Please sign up or login with your details

Forgot password? Click here to reset