Human Action Recognition with Deep Temporal Pyramids

by   Ahmed Mazari, et al.

Deep convolutional neural networks (CNNs) are nowadays achieving significant leaps in different pattern recognition tasks including action recognition. Current CNNs are increasingly deeper, data-hungrier and this makes their success tributary of the abundance of labeled training data. CNNs also rely on max/average pooling which reduces dimensionality of output layers and hence attenuates their sensitivity to the availability of labeled data. However, this process may dilute the information of upstream convolutional layers and thereby affect the discrimination power of the trained representations, especially when the learned categories are fine-grained. In this paper, we introduce a novel hierarchical aggregation design, for final pooling, that controls granularity of the learned representations w.r.t the actual granularity of action categories. Our solution is based on a tree-structured temporal pyramid that aggregates outputs of CNNs at different levels. Top levels of this hierarchy are dedicated to coarse categories while deep levels are more suitable to fine-grained ones. The design of our temporal pyramid is based on solving a constrained minimization problem whose solution corresponds to the distribution of weights of different representations in the temporal pyramid. Experiments conducted using the challenging UCF101 database show the relevance of our hierarchical design w.r.t other related methods.


Action Recognition with Deep Multiple Aggregation Networks

Most of the current action recognition algorithms are based on deep netw...

Deep hierarchical pooling design for cross-granularity action recognition

In this paper, we introduce a novel hierarchical aggregation design that...

Hierarchical Video Understanding

We introduce a hierarchical architecture for video understanding that ex...

Temporal Pyramid Pooling Based Convolutional Neural Networks for Action Recognition

Encouraged by the success of Convolutional Neural Networks (CNNs) in ima...

Human Action Recognition with Multi-Laplacian Graph Convolutional Networks

Convolutional neural networks are nowadays witnessing a major success in...

Pyramid Self-attention Polymerization Learning for Semi-supervised Skeleton-based Action Recognition

Most semi-supervised skeleton-based action recognition approaches aim to...

Please sign up or login with your details

Forgot password? Click here to reset