Transframer: Arbitrary Frame Prediction with Generative Models

03/17/2022
by   Charlie Nash, et al.
0

We present a general-purpose framework for image modelling and vision tasks based on probabilistic frame prediction. Our approach unifies a broad range of tasks, from image segmentation, to novel view synthesis and video interpolation. We pair this framework with an architecture we term Transframer, which uses U-Net and Transformer components to condition on annotated context frames, and outputs sequences of sparse, compressed image features. Transframer is the state-of-the-art on a variety of video generation benchmarks, is competitive with the strongest models on few-shot view synthesis, and can generate coherent 30 second videos from a single image without any explicit geometric information. A single generalist Transframer simultaneously produces promising results on 8 tasks, including semantic segmentation, image classification and optical flow prediction with no task-specific architectural components, demonstrating that multi-task computer vision can be tackled using probabilistic image models. Our approach can in principle be applied to a wide range of applications that require learning the conditional structure of annotated image-formatted data.

READ FULL TEXT

page 11

page 14

page 28

page 29

page 30

page 32

page 33

page 34

research
05/19/2022

Masked Conditional Video Diffusion for Prediction, Generation, and Interpolation

Video prediction is a challenging task. The quality of video frames from...
research
05/20/2022

UViM: A Unified Modeling Approach for Vision with Learned Guiding Codes

We introduce UViM, a unified approach capable of modeling a wide range o...
research
10/08/2018

Inter-BMV: Interpolation with Block Motion Vectors for Fast Semantic Segmentation on Video

Models optimized for accuracy on single images are often prohibitively s...
research
04/08/2016

STD2P: RGBD Semantic Segmentation Using Spatio-Temporal Data-Driven Pooling

We propose a novel superpixel-based multi-view convolutional neural netw...
research
06/15/2023

Infinite Photorealistic Worlds using Procedural Generation

We introduce Infinigen, a procedural generator of photorealistic 3D scen...
research
10/22/2021

IVS3D: An Open Source Framework for Intelligent Video Sampling and Preprocessing to Facilitate 3D Reconstruction

The creation of detailed 3D models is relevant for a wide range of appli...
research
03/16/2017

Convolutional Neural Network on Three Orthogonal Planes for Dynamic Texture Classification

Dynamic Textures (DTs) are sequences of images of moving scenes that exh...

Please sign up or login with your details

Forgot password? Click here to reset