Viewpoint Invariant Change Captioning

01/08/2019
by   Dong Huk Park, et al.
6

The ability to detect that something has changed in an environment is valuable, but often only if it can be accurately conveyed to a human operator. We introduce Viewpoint Invariant Change Captioning, and develop models which can both localize and describe via natural language complex changes in an environment. Moreover, we distinguish between a change in a viewpoint and an actual scene change (e.g. a change of objects' attributes). To study this new problem, we collect a Viewpoint Invariant Change Captioning Dataset (VICC), building it off the CLEVR dataset and engine. We introduce 5 types of scene changes, including changes in attributes, positions, etc. To tackle this problem, we propose an approach that distinguishes a viewpoint change from an important scene change, localizes the change between "before" and "after" images, and dynamically attends to the relevant visual features when describing the change. We benchmark a number of baselines on our new dataset, and systematically study the different change types. We show the superiority of our proposed approach in terms of change captioning and localization. Finally, we also show that our approach is general and can be applied to real images and language on the recent Spot-the-diff dataset.

READ FULL TEXT

page 1

page 3

page 5

page 7

page 8

page 10

research
09/30/2020

Finding It at Another Side: A Viewpoint-Adapted Matching Encoder for Change Captioning

Change Captioning is a task that aims to describe the difference between...
research
03/25/2021

Describing and Localizing Multiple Changes with Transformers

Change captioning tasks aim to detect changes in image pairs observed be...
research
08/21/2023

Explore and Tell: Embodied Visual Captioning in 3D Environments

While current visual captioning models have achieved impressive performa...
research
03/01/2017

Change Detection under Global Viewpoint Uncertainty

This paper addresses the problem of change detection from a novel perspe...
research
10/20/2021

R^3Net:Relation-embedded Representation Reconstruction Network for Change Captioning

Change captioning is to use a natural language sentence to describe the ...
research
09/15/2023

Viewpoint Integration and Registration with Vision Language Foundation Model for Image Change Understanding

Recently, the development of pre-trained vision language foundation mode...
research
09/28/2022

The Change You Want to See

We live in a dynamic world where things change all the time. Given two i...

Please sign up or login with your details

Forgot password? Click here to reset