DeepAI AI Chat
Log In Sign Up

Feature sampling and partitioning for visual vocabulary generation on large action classification datasets

by   Michael Sapienza, et al.

The recent trend in action recognition is towards larger datasets, an increasing number of action classes and larger visual vocabularies. State-of-the-art human action classification in challenging video data is currently based on a bag-of-visual-words pipeline in which space-time features are aggregated globally to form a histogram. The strategies chosen to sample features and construct a visual vocabulary are critical to performance, in fact often dominating performance. In this work we provide a critical evaluation of various approaches to building a vocabulary and show that good practises do have a significant impact. By subsampling and partitioning features strategically, we are able to achieve state-of-the-art results on 5 major action recognition datasets using relatively small visual vocabularies.


page 1

page 2

page 3

page 4


A Bag-of-Words Equivalent Recurrent Neural Network for Action Recognition

The traditional bag-of-words approach has found a wide range of applicat...

Feature Sampling Strategies for Action Recognition

Although dense local spatial-temporal features with bag-of-features repr...

Bag of Visual Words and Fusion Methods for Action Recognition: Comprehensive Study and Good Practice

Video based action recognition is one of the important and challenging p...

Transformed ROIs for Capturing Visual Transformations in Videos

Modeling the visual changes that an action brings to a scene is critical...

Temporal Pyramid Network for Action Recognition

Visual tempo characterizes the dynamics and the temporal scale of an act...

Anticipating human actions by correlating past with the future with Jaccard similarity measures

We propose a framework for early action recognition and anticipation by ...