A Deep Compositional Framework for Human-like Language Acquisition in Virtual Environment

by   Haonan Yu, et al.

We tackle a task where an agent learns to navigate in a 2D maze-like environment called XWORLD. In each session, the agent perceives a sequence of raw-pixel frames, a natural language command issued by a teacher, and a set of rewards. The agent learns the teacher's language from scratch in a grounded and compositional manner, such that after training it is able to correctly execute zero-shot commands: 1) the combination of words in the command never appeared before, and/or 2) the command contains new object concepts that are learned from another task but never learned from navigation. Our deep framework for the agent is trained end to end: it learns simultaneously the visual representations of the environment, the syntax and semantics of the language, and the action module that outputs actions. The zero-shot learning capability of our framework results from its compositionality and modularity with parameter tying. We visualize the intermediate outputs of the framework, demonstrating that the agent truly understands how to solve the problem. We believe that our results provide some preliminary insights on how to train an agent with similar abilities in a 3D environment.


Interactive Grounded Language Acquisition and Generalization in a 2D World

We build a virtual agent for learning language in a 2D maze-like world. ...

Intra-agent speech permits zero-shot task acquisition

Human language learners are exposed to a trickle of informative, context...

Compositional Generalization in Grounded Language Learning via Induced Model Sparsity

We provide a study of how induced model sparsity can help achieve compos...

Learning Manner of Execution from Partial Corrections

Some actions must be executed in different ways depending on the context...

Can an Embodied Agent Find Your "Cat-shaped Mug"? LLM-Based Zero-Shot Object Navigation

We present LGX, a novel algorithm for Object Goal Navigation in a "langu...

Learning to Follow Language Instructions with Compositional Policies

We propose a framework that learns to execute natural language instructi...

Learning Parameterized Task Structure for Generalization to Unseen Entities

Real world tasks are hierarchical and compositional. Tasks can be compos...

Please sign up or login with your details

Forgot password? Click here to reset