Recurrent networks, hidden states and beliefs in partially observable environments

08/06/2022
by   Gaspard Lambrechts, et al.
7

Reinforcement learning aims to learn optimal policies from interaction with environments whose dynamics are unknown. Many methods rely on the approximation of a value function to derive near-optimal policies. In partially observable environments, these functions depend on the complete sequence of observations and past actions, called the history. In this work, we show empirically that recurrent neural networks trained to approximate such value functions internally filter the posterior probability distribution of the current state given the history, called the belief. More precisely, we show that, as a recurrent neural network learns the Q-function, its hidden states become more and more correlated with the beliefs of state variables that are relevant to optimal control. This correlation is measured through their mutual information. In addition, we show that the expected return of an agent increases with the ability of its recurrent architecture to reach a high mutual information between its hidden states and the beliefs. Finally, we show that the mutual information between the hidden states and the beliefs of variables that are irrelevant for optimal control decreases through the learning process. In summary, this work shows that in its hidden states, a recurrent neural network approximating the Q-function of a partially observable environment reproduces a sufficient statistic from the history that is correlated to the relevant part of the belief for taking optimal actions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/06/2023

The Wasserstein Believer: Learning Belief Updates for Partially Observable Environments through Reliable Latent Space Models

Partially Observable Markov Decision Processes (POMDPs) are useful tools...
research
08/02/2020

Dynamic Discrete Choice Estimation with Partially Observable States and Hidden Dynamics

Dynamic discrete choice models are used to estimate the intertemporal pr...
research
03/05/2018

Recurrent Predictive State Policy Networks

We introduce Recurrent Predictive State Policy (RPSP) networks, a recurr...
research
10/29/2021

Sparsely Changing Latent States for Prediction and Planning in Partially Observable Domains

A common approach to prediction and planning in partially observable dom...
research
09/04/2022

Interpreting systems as solving POMDPs: a step towards a formal understanding of agency

Under what circumstances can a system be said to have beliefs and goals,...
research
07/30/2020

Learning what they think vs. learning what they do: The micro-foundations of vicarious learning

Vicarious learning is a vital component of organizational learning. We t...
research
06/09/2023

Approximate information state based convergence analysis of recurrent Q-learning

In spite of the large literature on reinforcement learning (RL) algorith...

Please sign up or login with your details

Forgot password? Click here to reset