Computing the Value of Computation for Planning

11/07/2018
by   Can Eren Sezener, et al.
0

An intelligent agent performs actions in order to achieve its goals. Such actions can either be externally directed, such as opening a door, or internally directed, such as writing data to a memory location or strengthening a synaptic connection. Some internal actions, to which we refer as computations, potentially help the agent choose better actions. Considering that (external) actions and computations might draw upon the same resources, such as time and energy, deciding when to act or compute, as well as what to compute, are detrimental to the performance of an agent. In an environment that provides rewards depending on an agent's behavior, an action's value is typically defined as the sum of expected long-term rewards succeeding the action (itself a complex quantity that depends on what the agent goes on to do after the action in question). However, defining the value of a computation is not as straightforward, as computations are only valuable in a higher order way, through the alteration of actions. This thesis offers a principled way of computing the value of a computation in a planning setting formalized as a Markov decision process. We present two different definitions of computation values: static and dynamic. They address two extreme cases of the computation budget: affording calculation of zero or infinitely many steps in the future. We show that these values have desirable properties, such as temporal consistency and asymptotic convergence. Furthermore, we propose methods for efficiently computing and approximating the static and dynamic computation values. We describe a sense in which the policies that greedily maximize these values can be optimal. We utilize these principles to construct Monte Carlo tree search algorithms that outperform most of the state-of-the-art in terms of finding higher quality actions given the same simulation resources.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/11/2020

Static and Dynamic Values of Computation in MCTS

Monte-Carlo Tree Search (MCTS) is one of the most-widely used methods fo...
research
09/23/2021

Adaptive Sampling using POMDPs with Domain-Specific Considerations

We investigate improving Monte Carlo Tree Search based solvers for Parti...
research
08/13/2019

Inverse Rational Control with Partially Observable Continuous Nonlinear Dynamics

Continuous control and planning remains a major challenge in robotics an...
research
06/09/2022

Value Memory Graph: A Graph-Structured World Model for Offline Reinforcement Learning

World models in model-based reinforcement learning usually face unrealis...
research
07/25/2018

Decentralized Cooperative Planning for Automated Vehicles with Hierarchical Monte Carlo Tree Search

Today's automated vehicles lack the ability to cooperate implicitly with...
research
10/28/2018

MaxHedge: Maximising a Maximum Online with Theoretical Performance Guarantees

We introduce a new online learning framework where, at each trial, the l...
research
07/02/2019

On Conflicting and Conflicting Values

Values are things that are important to us. Actions activate values - th...

Please sign up or login with your details

Forgot password? Click here to reset