Evolving Space-Time Neural Architectures for Videos

11/26/2018
by   AJ Piergiovanni, et al.
2

In this paper, we present a new method for evolving video CNN models to find architectures that more optimally captures rich spatio-temporal information in videos. Previous work, taking advantage of 3D convolutional layers, obtained promising results by manually designing CNN architectures for videos. We here develop an evolutionary algorithm that automatically explores models with different types and combinations of space-time convolutional layers to jointly capture various spatial and temporal aspects of video representations. We further propose a new key component in video model evolution, the iTGM layer, which more efficiently utilizes its parameters to allow learning of space-time interactions over longer time horizons. The experiments confirm the advantages of our video CNN architecture evolution, with results outperforming previous state-of-the-art models. Our algorithm discovers new and interesting video architecture structures.

READ FULL TEXT

page 3

page 12

page 13

page 14

page 15

research
05/30/2019

AssembleNet: Searching for Multi-Stream Neural Connectivity in Video Architectures

Learning to represent videos is a very challenging task both algorithmic...
research
03/15/2022

Learning Spatio-Temporal Downsampling for Effective Video Upscaling

Downsampling is one of the most basic image processing operations. Impro...
research
11/24/2018

Self-Supervised Video Representation Learning with Space-Time Cubic Puzzles

Self-supervised tasks such as colorization, inpainting and zigsaw puzzle...
research
03/18/2021

Space-Time Crop Attend: Improving Cross-modal Video Representation Learning

The quality of the image representations obtained from self-supervised l...
research
06/16/2023

Learning Space-Time Semantic Correspondences

We propose a new task of space-time semantic correspondence prediction i...
research
03/17/2023

Leaping Into Memories: Space-Time Deep Feature Synthesis

The success of deep learning models has led to their adaptation and adop...
research
04/19/2021

What can human minimal videos tell us about dynamic recognition models?

In human vision objects and their parts can be visually recognized from ...

Please sign up or login with your details

Forgot password? Click here to reset