The Power of Sound (TPoS): Audio Reactive Video Generation with Stable Diffusion

09/08/2023
by   Yujin Jeong, et al.
0

In recent years, video generation has become a prominent generative tool and has drawn significant attention. However, there is little consideration in audio-to-video generation, though audio contains unique qualities like temporal semantics and magnitude. Hence, we propose The Power of Sound (TPoS) model to incorporate audio input that includes both changeable temporal semantics and magnitude. To generate video frames, TPoS utilizes a latent stable diffusion model with textual semantic information, which is then guided by the sequential audio embedding from our pretrained Audio Encoder. As a result, this method produces audio reactive video contents. We demonstrate the effectiveness of TPoS across various tasks and compare its results with current state-of-the-art techniques in the field of audio-to-video generation. More examples are available at https://ku-vai.github.io/TPoS/

READ FULL TEXT

page 1

page 2

page 4

page 6

page 7

page 8

page 9

research
04/20/2022

Sound-Guided Semantic Video Generation

The recent success in StyleGAN demonstrates that pre-trained StyleGAN la...
research
11/19/2022

VarietySound: Timbre-Controllable Video to Sound Generation via Unsupervised Information Disentanglement

Video to sound generation aims to generate realistic and natural sound g...
research
05/22/2023

DiffAVA: Personalized Text-to-Audio Generation with Visual Alignment

Text-to-audio (TTA) generation is a recent popular problem that aims to ...
research
01/30/2023

ArchiSound: Audio Generation with Diffusion

The recent surge in popularity of diffusion models for image generation ...
research
06/08/2021

NWT: Towards natural audio-to-video generation with representation learning

In this work we introduce NWT, an expressive speech-to-video model. Unli...
research
09/29/2022

Creative Painting with Latent Diffusion Models

Artistic painting has achieved significant progress during recent years....
research
05/06/2023

AADiff: Audio-Aligned Video Synthesis with Text-to-Image Diffusion

Recent advances in diffusion models have showcased promising results in ...

Please sign up or login with your details

Forgot password? Click here to reset