We study the problem of synthesizing immersive 3D indoor scenes from one...
We study the automatic generation of navigation instructions from 360-de...
People navigating in unfamiliar buildings take advantage of myriad visua...
PanGEA, the Panoramic Graph Environment Annotation toolkit, is a lightwe...
Vision-and-Language Navigation wayfinding agents can be enhanced by
expl...
We present Where Are You? (WAY), a dataset of 6k dialogs in which two h...
We study the challenging problem of releasing a robot in a previously un...
We introduce Room-Across-Room (RxR), a new Vision-and-Language Navigatio...
Textual cues are essential for everyday tasks like buying groceries and ...
Following a navigation instruction such as 'Walk down the stairs and sto...
A visually-grounded navigation instruction can be interpreted as a seque...
One of the long-term challenges of robotics is to enable humans to
commu...
We introduce the task of scene-aware dialog. Given a follow-up question ...
Image captioning models have achieved impressive results on datasets
con...
In recent years, the natural language processing community has moved awa...
Skillful mobile operation in three-dimensional environments is a primary...
Image captioning is the process of generating a natural language descrip...
Image captioning models are becoming increasingly successful at describi...
A robot that can carry out a natural-language instruction has been a dre...
This paper presents a state-of-the-art model for visual question answeri...
Top-down visual attention mechanisms have been used extensively in image...
Existing image captioning models do not generalize well to out-of-domain...
There is considerable interest in the task of automatically generating i...