Agent Spaces

by   John C. Raisbeck, et al.

Exploration is one of the most important tasks in Reinforcement Learning, but it is not well-defined beyond finite problems in the Dynamic Programming paradigm (see Subsection 2.4). We provide a reinterpretation of exploration which can be applied to any online learning method. We come to this definition by approaching exploration from a new direction. After finding that concepts of exploration created to solve simple Markov decision processes with Dynamic Programming are no longer broadly applicable, we reexamine exploration. Instead of extending the ends of dynamic exploration procedures, we extend their means. That is, rather than repeatedly sampling every state-action pair possible in a process, we define the act of modifying an agent to itself be explorative. The resulting definition of exploration can be applied in infinite problems and non-dynamic learning methods, which the dynamic notion of exploration cannot tolerate. To understand the way that modifications of an agent affect learning, we describe a novel structure on the set of agents: a collection of distances (see footnote 7) d_a∈ A, which represent the perspectives of each agent possible in the process. Using these distances, we define a topology and show that many important structures in Reinforcement Learning are well behaved under the topology induced by convergence in the agent space.


A Meta-MDP Approach to Exploration for Lifelong Reinforcement Learning

In this paper we consider the problem of how a reinforcement learning ag...

Learning How to Infer Partial MDPs for In-Context Adaptation and Exploration

To generalize across tasks, an agent should acquire knowledge from past ...

Neural Online Graph Exploration

Can we learn how to explore unknown spaces efficiently? To answer this q...

Distributed Dynamic Programming forNetworked Multi-Agent Markov Decision Processes

The main goal of this paper is to investigate distributed dynamic progra...

Dynamic Programming for Structured Continuous Markov Decision Problems

We describe an approach for exploiting structure in Markov Decision Proc...

Goal-oriented Trajectories for Efficient Exploration

Exploration is a difficult challenge in reinforcement learning and even ...

How to Discount Deep Reinforcement Learning: Towards New Dynamic Strategies

Using deep neural nets as function approximator for reinforcement learni...

Please sign up or login with your details

Forgot password? Click here to reset