The act of remembering: a study in partially observable reinforcement learning

10/05/2020
by   Rodrigo Toro Icarte, et al.
0

Reinforcement Learning (RL) agents typically learn memoryless policies—policies that only consider the last observation when selecting actions. Learning memoryless policies is efficient and optimal in fully observable environments. However, some form of memory is necessary when RL agents are faced with partial observability. In this paper, we study a lightweight approach to tackle partial observability in RL. We provide the agent with an external memory and additional actions to control what, if anything, is written to the memory. At every step, the current memory state is part of the agent's observation, and the agent selects a tuple of actions: one action that modifies the environment and another that modifies the memory. When the external memory is sufficiently expressive, optimal memoryless policies yield globally optimal solutions. Unfortunately, previous attempts to use external memory in the form of binary memory have produced poor results in practice. Here, we investigate alternative forms of memory in support of learning effective memoryless policies. Our novel forms of memory outperform binary and LSTM-based memory in well-established partially observable domains.

READ FULL TEXT
research
10/25/2021

Learning What to Memorize: Using Intrinsic Motivation to Form Useful Memory in Partially Observable Reinforcement Learning

Reinforcement Learning faces an important challenge in partial observabl...
research
06/16/2021

How memory architecture affects performance and learning in simple POMDPs

Reinforcement learning is made much more complex when the agent's observ...
research
06/09/2021

TempoRL: Learning When to Act

Reinforcement learning is a powerful approach to learn behaviour through...
research
11/21/2016

Memory Lens: How Much Memory Does an Agent Use?

We propose a new method to study the internal memory used by reinforceme...
research
06/23/2021

Evolving Hierarchical Memory-Prediction Machines in Multi-Task Reinforcement Learning

A fundamental aspect of behaviour is the ability to encode salient featu...
research
06/27/2012

Apprenticeship Learning for Model Parameters of Partially Observable Environments

We consider apprenticeship learning, i.e., having an agent learn a task ...
research
06/15/2023

Semantic HELM: An Interpretable Memory for Reinforcement Learning

Reinforcement learning agents deployed in the real world often have to c...

Please sign up or login with your details

Forgot password? Click here to reset