What Would You Expect? Anticipating Egocentric Actions with Rolling-Unrolling LSTMs and Modality Attention

05/22/2019
by   Antonino Furnari, et al.
2

Egocentric action anticipation consists in understanding which objects the camera wearer will interact with in the near future and which actions they will perform. We tackle the problem proposing an architecture able to anticipate actions at multiple temporal scales using two LSTMs to 1) summarize the past, and 2) formulate predictions about the future. The input video is processed considering three complimentary modalities: appearance (RGB), motion (optical flow) and objects (object-based features). Modality-specific predictions are fused using a novel Modality ATTention (MATT) mechanism which learns to weigh modalities in an adaptive fashion. Extensive evaluations on two large-scale benchmark datasets show that our method outperforms prior art by up to +7 the challenging EPIC-KITCHENS dataset including more than 2500 actions, and generalizes to EGTEA Gaze+. Our approach is also shown to generalize to the tasks of early action recognition and action recognition. At the moment of submission, our method is ranked first in the leaderboard of the EPIC-KITCHENS egocentric action anticipation challenge.

READ FULL TEXT

page 1

page 8

page 10

page 13

page 14

page 15

research
05/04/2020

Rolling-Unrolling LSTMs for Action Anticipation from First-Person Video

In this paper, we tackle the problem of egocentric action anticipation, ...
research
01/05/2021

Trear: Transformer-based RGB-D Egocentric Action Recognition

In this paper, we propose a Transformer-based RGB-D egocentric action re...
research
09/02/2021

SlowFast Rolling-Unrolling LSTMs for Action Anticipation in Egocentric Videos

Action anticipation in egocentric videos is a difficult task due to the ...
research
08/06/2021

Feature-Supervised Action Modality Transfer

This paper strives for action recognition and detection in video modalit...
research
11/01/2021

With a Little Help from my Temporal Context: Multimodal Egocentric Action Recognition

In egocentric videos, actions occur in quick succession. We capitalise o...
research
05/21/2023

Prompt Learning for Action Recognition

We present a new general learning approach for action recognition, Promp...
research
05/26/2021

Anticipating human actions by correlating past with the future with Jaccard similarity measures

We propose a framework for early action recognition and anticipation by ...

Please sign up or login with your details

Forgot password? Click here to reset