Discover Life Skills for Planning with Bandits via Observing and Learning How the World Works

07/17/2022
by   Tin Lai, et al.
0

We propose a novel approach for planning agents to compose abstract skills via observing and learning from historical interactions with the world. Our framework operates in a Markov state-space model via a set of actions under unknown pre-conditions. We formulate skills as high-level abstract policies that propose action plans based on the current state. Each policy learns new plans by observing the states' transitions while the agent interacts with the world. Such an approach automatically learns new plans to achieve specific intended effects, but the success of such plans is often dependent on the states in which they are applicable. Therefore, we formulate the evaluation of such plans as infinitely many multi-armed bandit problems, where we balance the allocation of resources on evaluating the success probability of existing arms and exploring new options. The result is a planner capable of automatically learning robust high-level skills under a noisy environment; such skills implicitly learn the action pre-condition without explicit knowledge. We show that this planning approach is experimentally very competitive in high-dimensional state space domains.

READ FULL TEXT
research
10/25/2020

Robust Hierarchical Planning with Policy Delegation

We propose a novel framework and algorithm for hierarchical planning bas...
research
06/18/2019

Learning to Plan Hierarchically from Curriculum

We present a framework for learning to plan hierarchically in domains wi...
research
03/04/2022

Plan Your Target and Learn Your Skills: Transferable State-Only Imitation Learning via Decoupled Policy Optimization

Recent progress in state-only imitation learning extends the scope of ap...
research
10/16/2018

Learning abstract planning domains and mappings to real world perceptions

Most of the works on planning and learning, e.g., planning by (model bas...
research
11/17/2020

Sim-to-Real Task Planning and Execution from Perception via Reactivity and Recovery

Zero-shot execution of unseen robotic tasks is an important problem in r...
research
11/17/2022

Planning with Large Language Models via Corrective Re-prompting

Extracting the common sense knowledge present in Large Language Models (...
research
02/28/2017

Stacked Thompson Bandits

We introduce Stacked Thompson Bandits (STB) for efficiently generating p...

Please sign up or login with your details

Forgot password? Click here to reset