Design Light-weight 3D Convolutional Networks for Video Recognition Temporal Residual, Fully Separable Block, and Fast Algorithm

05/31/2019
by   Haonan Wang, et al.
0

Deep 3-dimensional (3D) Convolutional Network (ConvNet) has shown promising performance on video recognition tasks because of its powerful spatio-temporal information fusion ability. However, the extremely intensive requirements on memory access and computing power prohibit it from being used in resource-constrained scenarios, such as portable and edge devices. So in this paper, we first propose a two-stage Fully Separable Block (FSB) to significantly compress the model sizes of 3D ConvNets. Then a feature enhancement approach named Temporal Residual Gradient (TRG) is developed to improve the performance of compressed model on video tasks, which provides higher accuracy, faster convergency and better robustness. Moreover, in order to further decrease the computing workload, we propose a hybrid Fast Algorithm (hFA) to drastically reduce the computation complexity of convolutions. These methods are effectively combined to design a light-weight and efficient ConvNet for video recognition tasks. Experiments on the popular dataset report 2.3x compression rate, 3.6x workload reduction, and 6.3 the state-of-the-art SlowFast model, which is already a highly compact model. The proposed methods also show good adaptability on traditional 3D ConvNet, demonstrating 7.4x more compact model, 11.0x less workload, and 3.0 accuracy

READ FULL TEXT
research
04/05/2019

Fast Spatio-Temporal Residual Network for Video Super-Resolution

Recently, deep learning based video super-resolution (SR) methods have a...
research
07/30/2018

Multi-Fiber Networks for Video Recognition

In this paper, we aim to reduce the computational cost of spatio-tempora...
research
02/18/2020

V4D:4D Convolutional Neural Networks for Video-level Representation Learning

Most existing 3D CNNs for video representation learning are clip-based m...
research
12/05/2022

Algorithm and Hardware Co-Design of Energy-Efficient LSTM Networks for Video Recognition with Hierarchical Tucker Tensor Decomposition

Long short-term memory (LSTM) is a type of powerful deep neural network ...
research
01/22/2021

B-DRRN: A Block Information Constrained Deep Recursive Residual Network for Video Compression Artifacts Reduction

Although the video compression ratio nowadays becomes higher, the video ...
research
06/03/2021

ERANNs: Efficient Residual Audio Neural Networks for Audio Pattern Recognition

We present a new architecture of convolutional neural networks (CNNs) ba...
research
03/31/2022

A Temporal-oriented Broadcast ResNet for COVID-19 Detection

Detecting COVID-19 from audio signals, such as breathing and coughing, c...

Please sign up or login with your details

Forgot password? Click here to reset