Combining Q-Learning and Search with Amortized Value Estimates

12/05/2019
by   Jessica B. Hamrick, et al.
0

We introduce "Search with Amortized Value Estimates" (SAVE), an approach for combining model-free Q-learning with model-based Monte-Carlo Tree Search (MCTS). In SAVE, a learned prior over state-action values is used to guide MCTS, which estimates an improved set of state-action values. The new Q-estimates are then used in combination with real experience to update the prior. This effectively amortizes the value computation performed by MCTS, resulting in a cooperative relationship between model-free learning and model-based search. SAVE can be implemented on top of any Q-learning agent with access to a model, which we demonstrate by incorporating it into agents that perform challenging physical reasoning tasks and Atari. SAVE consistently achieves higher rewards with fewer training steps, and—in contrast to typical model-based search approaches—yields strong performance with very small search budgets. By combining real experience with information computed during search, SAVE demonstrates that it is possible to improve on both the performance of model-free learning and the computational cost of planning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/22/2018

Deep Reinforcement Learning with Model Learning and Monte Carlo Tree Search in Minecraft

Deep reinforcement learning has been successfully applied to several vis...
research
08/28/2020

On the model-based stochastic value gradient for continuous reinforcement learning

Model-based reinforcement learning approaches add explicit domain knowle...
research
12/28/2018

Dynamic Planning Networks

We introduce Dynamic Planning Networks (DPN), a novel architecture for d...
research
02/27/2023

Taylor TD-learning

Many reinforcement learning approaches rely on temporal-difference (TD) ...
research
11/15/2022

Model free Shapley values for high dimensional data

A model-agnostic variable importance method can be used with arbitrary p...
research
12/22/2020

Learning to Play Imperfect-Information Games by Imitating an Oracle Planner

We consider learning to play multiplayer imperfect-information games wit...
research
04/05/2019

Structured agents for physical construction

Physical construction -- the ability to compose objects, subject to phys...

Please sign up or login with your details

Forgot password? Click here to reset