Make-A-Story: Visual Memory Conditioned Consistent Story Generation

11/23/2022
by   Tanzila Rahman, et al.
0

There has been a recent explosion of impressive generative models that can produce high quality images (or videos) conditioned on text descriptions. However, all such approaches rely on conditional sentences that contain unambiguous descriptions of scenes and main actors in them. Therefore employing such models for more complex task of story visualization, where naturally references and co-references exist, and one requires to reason about when to maintain consistency of actors and backgrounds across frames/scenes, and when not to, based on story progression, remains a challenge. In this work, we address the aforementioned challenges and propose a novel autoregressive diffusion-based framework with a visual memory module that implicitly captures the actor and background context across the generated frames. Sentence-conditioned soft attention over the memories enables effective reference resolution and learns to maintain scene and actor consistency when needed. To validate the effectiveness of our approach, we extend the MUGEN dataset and introduce additional characters, backgrounds and referencing in multi-sentence storylines. Our experiments for story generation on the MUGEN and the FlintstonesSV dataset show that our method not only outperforms prior state-of-the-art in generating frames with high visual quality, which are consistent with the story, but also models appropriate correspondences between the characters and the background.

READ FULL TEXT

page 1

page 6

page 7

page 8

research
12/06/2018

StoryGAN: A Sequential Conditional GAN for Story Visualization

In this work we propose a new task called Story Visualization. Given a m...
research
05/26/2023

Improved Visual Story Generation with Adaptive Context Modeling

Diffusion models developed on top of powerful text-to-image generation m...
research
08/03/2022

Word-Level Fine-Grained Story Visualization

Story visualization aims to generate a sequence of images to narrate eac...
research
09/26/2019

A Hierarchical Approach for Visual Storytelling Using Image Description

One of the primary challenges of visual storytelling is developing techn...
research
10/17/2020

Consistency and Coherency Enhanced Story Generation

Story generation is a challenging task, which demands to maintain consis...
research
11/20/2022

Synthesizing Coherent Story with Auto-Regressive Latent Diffusion Models

Conditioned diffusion models have demonstrated state-of-the-art text-to-...
research
09/18/2023

Causal-Story: Local Causal Attention Utilizing Parameter-Efficient Tuning For Visual Story Synthesis

The excellent text-to-image synthesis capability of diffusion models has...

Please sign up or login with your details

Forgot password? Click here to reset