Streaming Multiscale Deep Equilibrium Models

04/28/2022
by   Can Ufuk Ertenli, et al.
0

We present StreamDEQ, a method that infers frame-wise representations on videos with minimal per-frame computation. In contrast to conventional methods where compute time grows at least linearly with the network depth, we aim to update the representations in a continuous manner. For this purpose, we leverage the recently emerging implicit layer model which infers the representation of an image by solving a fixed-point problem. Our main insight is to leverage the slowly changing nature of videos and use the previous frame representation as an initial condition on each frame. This scheme effectively recycles the recent inference computations and greatly reduces the needed processing time. Through extensive experimental analysis, we show that StreamDEQ is able to recover near-optimal representations in a few frames time, and maintain an up-to-date representation throughout the video duration. Our experiments on video semantic segmentation and video object detection show that StreamDEQ achieves on par accuracy with the baseline (standard MDEQ) while being more than 3× faster. The project page is available at: https://ufukertenli.github.io/streamdeq/

READ FULL TEXT

page 3

page 10

page 13

research
03/25/2021

Real-Time and Accurate Object Detection in Compressed Video by Long Short-term Feature Aggregation

Video object detection is a fundamental problem in computer vision and h...
research
11/22/2017

Video Semantic Object Segmentation by Self-Adaptation of DCNN

This paper proposes a new framework for semantic segmentation of objects...
research
10/29/2019

Sequential image processing methods for improving semantic video segmentation algorithms

Recently, semantic video segmentation gained high attention especially f...
research
07/17/2022

E-NeRV: Expedite Neural Video Representation with Disentangled Spatial-Temporal Context

Recently, the image-wise implicit neural representation of videos, NeRV,...
research
07/29/2023

XMem++: Production-level Video Segmentation From Few Annotated Frames

Despite advancements in user-guided video segmentation, extracting compl...
research
07/04/2022

GraphVid: It Only Takes a Few Nodes to Understand a Video

We propose a concise representation of videos that encode perceptually m...
research
04/05/2019

Point-to-Point Video Generation

While image manipulation achieves tremendous breakthroughs (e.g., genera...

Please sign up or login with your details

Forgot password? Click here to reset