Revisiting the Effectiveness of Off-the-shelf Temporal Modeling Approaches for Large-scale Video Classification

08/12/2017
by   Yunlong Bian, et al.
0

This paper describes our solution for the video recognition task of ActivityNet Kinetics challenge that ranked the 1st place. Most of existing state-of-the-art video recognition approaches are in favor of an end-to-end pipeline. One exception is the framework of DevNet. The merit of DevNet is that they first use the video data to learn a network (i.e. fine-tuning or training from scratch). Instead of directly using the end-to-end classification scores (e.g. softmax scores), they extract the features from the learned network and then fed them into the off-the-shelf machine learning models to conduct video classification. However, the effectiveness of this line work has long-term been ignored and underestimated. In this submission, we extensively use this strategy. Particularly, we investigate four temporal modeling approaches using the learned features: Multi-group Shifting Attention Network, Temporal Xception Network, Multi-stream sequence Model and Fast-Forward Sequence Model. Experiment results on the challenging Kinetics dataset demonstrate that our proposed temporal modeling approaches can significantly improve existing approaches in the large-scale video recognition tasks. Most remarkably, our best single Multi-group Shifting Attention Network can achieve 77.7 top-1 accuracy and 93.2

READ FULL TEXT

page 2

page 3

research
07/14/2017

Temporal Modeling Approaches for Large-scale Youtube-8M Video Understanding

This paper describes our solution for the video recognition task of the ...
research
06/15/2023

SF-TMN: SlowFast Temporal Modeling Network for Surgical Phase Recognition

Automatic surgical phase recognition is one of the key technologies to s...
research
09/03/2018

YouTube-VOS: Sequence-to-Sequence Video Object Segmentation

Learning long-term spatial-temporal features are critical for many video...
research
11/14/2022

Supervised Fine-tuning Evaluation for Long-term Visual Place Recognition

In this paper, we present a comprehensive study on the utility of deep c...
research
09/12/2017

End-to-End United Video Dehazing and Detection

The recent development of CNN-based image dehazing has revealed the effe...
research
11/27/2017

Attention Clusters: Purely Attention Based Local Feature Integration for Video Classification

Recently, substantial research effort has focused on how to apply CNNs o...
research
11/06/2017

End-to-End Video Classification with Knowledge Graphs

Video understanding has attracted much research attention especially sin...

Please sign up or login with your details

Forgot password? Click here to reset