Visualizing MuZero Models

02/25/2021
by   Joery A. de Vries, et al.
0

MuZero, a model-based reinforcement learning algorithm that uses a value equivalent dynamics model, achieved state-of-the-art performance in Chess, Shogi and the game of Go. In contrast to standard forward dynamics models that predict a full next state, value equivalent models are trained to predict a future value, thereby emphasizing value relevant information in the representations. While value equivalent models have shown strong empirical success, there is no research yet that visualizes and investigates what types of representations these models actually learn. Therefore, in this paper we visualize the latent representation of MuZero agents. We find that action trajectories may diverge between observation embeddings and internal state transition dynamics, which could lead to instability during planning. Based on this insight, we propose two regularization techniques to stabilize MuZero's performance. Additionally, we provide an open-source implementation of MuZero along with an interactive visualizer of learned representations, which may aid further investigation of value equivalent algorithms.

READ FULL TEXT

page 5

page 13

research
12/17/2022

Latent Variable Representation for Reinforcement Learning

Deep latent variable models have achieved significant empirical successe...
research
01/24/2023

Minimal Value-Equivalent Partial Models for Scalable and Robust Planning in Lifelong Reinforcement Learning

Learning models of the environment from pure interaction is often consid...
research
09/01/2019

Learning Local Forward Models on Unforgiving Games

This paper examines learning approaches for forward models based on loca...
research
12/06/2022

Understanding Self-Predictive Learning for Reinforcement Learning

We study the learning dynamics of self-predictive learning for reinforce...
research
06/29/2020

Exploring Optimal Control With Observations at a Cost

There has been a current trend in reinforcement learning for healthcare ...
research
05/29/2023

Towards a Better Understanding of Representation Dynamics under TD-learning

TD-learning is a foundation reinforcement learning (RL) algorithm for va...
research
03/07/2023

Diminishing Return of Value Expansion Methods in Model-Based Reinforcement Learning

Model-based reinforcement learning is one approach to increase sample ef...

Please sign up or login with your details

Forgot password? Click here to reset