Generating Multi-Sentence Lingual Descriptions of Indoor Scenes

02/28/2015
by   Dahua Lin, et al.
0

This paper proposes a novel framework for generating lingual descriptions of indoor scenes. Whereas substantial efforts have been made to tackle this problem, previous approaches focusing primarily on generating a single sentence for each image, which is not sufficient for describing complex scenes. We attempt to go beyond this, by generating coherent descriptions with multiple sentences. Our approach is distinguished from conventional ones in several aspects: (1) a 3D visual parsing system that jointly infers objects, attributes, and relations; (2) a generative grammar learned automatically from training text; and (3) a text generation algorithm that takes into account the coherence among sentences. Experiments on the augmented NYU-v2 dataset show that our framework can generate natural descriptions with substantially higher ROGUE scores compared to those produced by the baseline.

READ FULL TEXT
research
07/26/2018

Move Forward and Tell: A Progressive Generator of Video Descriptions

We present an efficient framework that can generate a coherent paragraph...
research
05/29/2023

Text-Only Image Captioning with Multi-Context Data Generation

Text-only Image Captioning (TIC) is an approach that aims to construct a...
research
11/20/2014

Learning a Recurrent Visual Representation for Image Caption Generation

In this paper we explore the bi-directional mapping between images and t...
research
04/05/2022

DT2I: Dense Text-to-Image Generation from Region Descriptions

Despite astonishing progress, generating realistic images of complex sce...
research
10/26/2019

Diverse Video Captioning Through Latent Variable Expansion with Conditional GAN

Automatically describing video content with text description is challeng...
research
08/31/2018

Learning to Describe Differences Between Pairs of Similar Images

In this paper, we introduce the task of automatically generating text to...
research
10/26/2020

Reading Between the Lines: Exploring Infilling in Visual Narratives

Generating long form narratives such as stories and procedures from mult...

Please sign up or login with your details

Forgot password? Click here to reset