Learning to Model Multimodal Semantic Alignment for Story Visualization

11/14/2022
by   Bowen Li, et al.
0

Story visualization aims to generate a sequence of images to narrate each sentence in a multi-sentence story, where the images should be realistic and keep global consistency across dynamic scenes and characters. Current works face the problem of semantic misalignment because of their fixed architecture and diversity of input modalities. To address this problem, we explore the semantic alignment between text and image representations by learning to match their semantic levels in the GAN-based generative model. More specifically, we introduce dynamic interactions according to learning to dynamically explore various semantic depths and fuse the different-modal information at a matched semantic level, which thus relieves the text-image semantic misalignment problem. Extensive experiments on different datasets demonstrate the improvements of our approach, neither using segmentation masks nor auxiliary captioning networks, on image quality and story consistency, compared with state-of-the-art methods.

READ FULL TEXT

page 4

page 5

research
08/03/2022

Word-Level Fine-Grained Story Visualization

Story visualization aims to generate a sequence of images to narrate eac...
research
05/20/2021

Improving Generation and Evaluation of Visual Stories via Semantic Consistency

Story visualization is an under-explored task that falls at the intersec...
research
10/21/2021

Integrating Visuospatial, Linguistic and Commonsense Structure into Story Visualization

While much research has been done in text-to-image synthesis, little wor...
research
10/16/2022

Character-Centric Story Visualization via Visual Planning and Token Alignment

Story visualization advances the traditional text-to-image generation by...
research
08/17/2023

Text-Only Training for Visual Storytelling

Visual storytelling aims to generate a narrative based on a sequence of ...
research
01/09/2023

An Impartial Transformer for Story Visualization

Story Visualization is an advanced task of computed vision that targets ...
research
09/08/2023

Style Generation: Image Synthesis based on Coarsely Matched Texts

Previous text-to-image synthesis algorithms typically use explicit textu...

Please sign up or login with your details

Forgot password? Click here to reset