Tactical Rewind: Self-Correction via Backtracking in Vision-and-Language Navigation

03/06/2019
by   Liyiming Ke, et al.
6

We present FAST NAVIGATOR, a general framework for action decoding, which yields state-of-the-art results on the recent Room-to-Room (R2R) Vision-and-Language navigation challenge of Anderson et. al. (2018). Given a natural language instruction and photo-realistic image views of a previously unseen environment, the agent must navigate from a source to a target location as quickly as possible. While all of current approaches make local action decisions or score entire trajectories with beam search, our framework seamlessly balances local and global signals when exploring the environment. Importantly, this allows us to act greedily, but use global signals to backtrack when necessary. Our FAST framework, applied to existing models, yielded a 17 gain on success rate weighted by path length (SPL).

READ FULL TEXT

page 1

page 11

page 12

page 13

research
09/05/2019

Robust Navigation with Language Pretraining and Stochastic Sampling

Core to the vision-and-language navigation (VLN) challenge is building r...
research
02/25/2020

Towards Learning a Generic Agent for Vision-and-Language Navigation via Pre-training

Learning to navigate in a visual environment following natural-language ...
research
09/16/2020

Generative Language-Grounded Policy in Vision-and-Language Navigation with Bayes' Rule

Vision-and-language navigation (VLN) is a task in which an agent is embo...
research
05/23/2023

Masked Path Modeling for Vision-and-Language Navigation

Vision-and-language navigation (VLN) agents are trained to navigate in r...
research
05/08/2023

Accessible Instruction-Following Agent

Humans can collaborate and complete tasks based on visual signals and in...
research
11/22/2020

Language-guided Navigation via Cross-Modal Grounding and Alternate Adversarial Learning

The emerging vision-and-language navigation (VLN) problem aims at learni...
research
11/28/2021

Explore the Potential Performance of Vision-and-Language Navigation Model: a Snapshot Ensemble Method

Vision-and-Language Navigation (VLN) is a challenging task in the field ...

Please sign up or login with your details

Forgot password? Click here to reset