An Action Is Worth Multiple Words: Handling Ambiguity in Action Recognition

10/10/2022
by   Kiyoon Kim, et al.
7

Precisely naming the action depicted in a video can be a challenging and oftentimes ambiguous task. In contrast to object instances represented as nouns (e.g. dog, cat, chair, etc.), in the case of actions, human annotators typically lack a consensus as to what constitutes a specific action (e.g. jogging versus running). In practice, a given video can contain multiple valid positive annotations for the same action. As a result, video datasets often contain significant levels of label noise and overlap between the atomic action classes. In this work, we address the challenge of training multi-label action recognition models from only single positive training labels. We propose two approaches that are based on generating pseudo training examples sampled from similar instances within the train set. Unlike other approaches that use model-derived pseudo-labels, our pseudo-labels come from human annotations and are selected based on feature similarity. To validate our approaches, we create a new evaluation benchmark by manually annotating a subset of EPIC-Kitchens-100's validation set with multiple verb labels. We present results on this new test set along with additional results on a new version of HMDB-51, called Confusing-HMDB-102, where we outperform existing methods in both cases. Data and code are available at https://github.com/kiyoon/verb_ambiguity

READ FULL TEXT

page 2

page 6

page 7

page 13

page 15

research
06/17/2021

BABEL: Bodies, Action and Behavior with English Labels

Understanding the semantics of human movement – the what, how and why of...
research
08/02/2019

An Evaluation of Action Recognition Models on EPIC-Kitchens

We benchmark contemporary action recognition models (TSN, TRN, and TSM) ...
research
11/24/2022

Video Test-Time Adaptation for Action Recognition

Although action recognition systems can achieve top performance when eva...
research
03/30/2020

Speech2Action: Cross-modal Supervision for Action Recognition

Is it possible to guess human action from dialogue alone? In this work w...
research
08/11/2021

Learning Action Completeness from Points for Weakly-supervised Temporal Action Localization

We tackle the problem of localizing temporal intervals of actions with o...
research
09/29/2022

REST: REtrieve Self-Train for generative action recognition

This work is on training a generative action/video recognition model who...
research
05/21/2021

Sharing Pain: Using Domain Transfer Between Pain Types for Recognition of Sparse Pain Expressions in Horses

Orthopedic disorders are a common cause for euthanasia among horses, whi...

Please sign up or login with your details

Forgot password? Click here to reset