L3MVN: Leveraging Large Language Models for Visual Target Navigation

04/11/2023
by   Bangguo Yu, et al.
0

Visual target navigation in unknown environments is a crucial problem in robotics. Despite extensive investigation of classical and learning-based approaches in the past, robots lack common-sense knowledge about household objects and layouts. Prior state-of-the-art approaches to this task rely on learning the priors during the training and typically require significant expensive resources and time for learning. To address this, we propose a new framework for visual target navigation that leverages Large Language Models (LLM) to impart common sense for object searching. Specifically, we introduce two paradigms: (i) zero-shot and (ii) feed-forward approaches that use language to find the relevant frontier from the semantic map as a long-term goal and explore the environment efficiently. Our analysis demonstrates the notable zero-shot generalization and transfer capabilities from the use of language. Experiments on Gibson and Habitat-Matterport 3D (HM3D) demonstrate that the proposed framework significantly outperforms existing map-based methods in terms of success rate and generalization. Ablation analysis also indicates that the common-sense knowledge from the language model leads to more efficient semantic exploration. Finally, we provide a real robot experiment to verify the applicability of our framework in real-world scenarios. The supplementary video and code can be accessed via the following link: https://sites.google.com/view/l3mvn.

READ FULL TEXT

page 1

page 3

page 5

page 6

research
04/11/2023

Frontier Semantic Exploration for Visual Target Navigation

This work focuses on the problem of visual target navigation, which is v...
research
07/25/2023

GraspGPT: Leveraging Semantic Knowledge from a Large Language Model for Task-Oriented Grasping

Task-oriented grasping (TOG) refers to the problem of predicting grasps ...
research
06/09/2022

Extracting Zero-shot Common Sense from Large Language Models for Robot 3D Scene Understanding

Semantic 3D scene understanding is a problem of critical importance in r...
research
09/12/2022

Leveraging Large Language Models for Robot 3D Scene Understanding

Semantic 3D scene understanding is a problem of critical importance in r...
research
03/06/2023

Can an Embodied Agent Find Your "Cat-shaped Mug"? LLM-Based Zero-Shot Object Navigation

We present LGX, a novel algorithm for Object Goal Navigation in a "langu...
research
09/20/2023

Discuss Before Moving: Visual Language Navigation via Multi-expert Discussions

Visual language navigation (VLN) is an embodied task demanding a wide ra...
research
06/24/2022

ZSON: Zero-Shot Object-Goal Navigation using Multimodal Goal Embeddings

We present a scalable approach for learning open-world object-goal navig...

Please sign up or login with your details

Forgot password? Click here to reset