A Geometric Perspective on Self-Supervised Policy Adaptation

11/14/2020
by   Cristian Bodnar, et al.
18

One of the most challenging aspects of real-world reinforcement learning (RL) is the multitude of unpredictable and ever-changing distractions that could divert an agent from what was tasked to do in its training environment. While an agent could learn from reward signals to ignore them, the complexity of the real-world can make rewards hard to acquire, or, at best, extremely sparse. A recent class of self-supervised methods have shown promise that reward-free adaptation under challenging distractions is possible. However, previous work focused on a short one-episode adaptation setting. In this paper, we consider a long-term adaptation setup that is more akin to the specifics of the real-world and propose a geometric perspective on self-supervised adaptation. We empirically describe the processes that take place in the embedding space during this adaptation process, reveal some of its undesirable effects on performance and show how they can be eliminated. Moreover, we theoretically study how actor-based and actor-free agents can further generalise to the target environment by manipulating the geometry of the manifolds described by the actor and critic functions.

READ FULL TEXT

page 2

page 13

page 14

page 16

research
06/10/2020

Self-Supervised Reinforcement Learning forRecommender Systems

In session-based or sequential recommendation, it is important to consid...
research
06/10/2020

Self-Supervised Reinforcement Learning for Recommender Systems

In session-based or sequential recommendation, it is important to consid...
research
02/21/2023

Potential-based reward shaping for learning to play text-based adventure games

Text-based games are a popular testbed for language-based reinforcement ...
research
07/08/2020

Self-Supervised Policy Adaptation during Deployment

In most real world scenarios, a policy trained by reinforcement learning...
research
11/05/2021

Supervised Advantage Actor-Critic for Recommender Systems

Casting session-based or sequential recommendation as reinforcement lear...
research
05/19/2022

Deconfounding Actor-Critic Network with Policy Adaptation for Dynamic Treatment Regimes

Despite intense efforts in basic and clinical research, an individualize...
research
06/14/2023

A reinforcement learning strategy for p-adaptation in high order solvers

Reinforcement learning (RL) has emerged as a promising approach to autom...

Please sign up or login with your details

Forgot password? Click here to reset