Contrastive introspection (ConSpec) to rapidly identify invariant steps for success

by   Chen Sun, et al.

Reinforcement learning (RL) algorithms have achieved notable success in recent years, but still struggle with fundamental issues in long-term credit assignment. It remains difficult to learn in situations where success is contingent upon multiple critical steps that are distant in time from each other and from a sparse reward; as is often the case in real life. Moreover, how RL algorithms assign credit in these difficult situations is typically not coded in a way that can rapidly generalize to new situations. Here, we present an approach using offline contrastive learning, which we call contrastive introspection (ConSpec), that can be added to any existing RL algorithm and addresses both issues. In ConSpec, a contrastive loss is used during offline replay to identify invariances among successful episodes. This takes advantage of the fact that it is easier to retrospectively identify the small set of steps that success is contingent upon than it is to prospectively predict reward at every step taken in the environment. ConSpec stores this knowledge in a collection of prototypes summarizing the intermediate states required for success. During training, arrival at any state that matches these prototypes generates an intrinsic reward that is added to any external rewards. As well, the reward shaping provided by ConSpec can be made to preserve the optimal policy of the underlying RL agent. The prototypes in ConSpec provide two key benefits for credit assignment: (1) They enable rapid identification of all the critical states. (2) They do so in a readily interpretable manner, enabling out of distribution generalization when sensory features are altered. In summary, ConSpec is a modular system that can be added to any existing RL algorithm to improve its long-term credit assignment.


page 3

page 7

page 8

page 9

page 15


When Do Transformers Shine in RL? Decoupling Memory from Credit Assignment

Reinforcement learning (RL) algorithms face two distinct challenges: lea...

InferNet for Delayed Reinforcement Tasks: Addressing the Temporal Credit Assignment Problem

The temporal Credit Assignment Problem (CAP) is a well-known and challen...

Synthetic Returns for Long-Term Credit Assignment

Since the earliest days of reinforcement learning, the workhorse method ...

Learning Guidance Rewards with Trajectory-space Smoothing

Long-term temporal credit assignment is an important challenge in deep r...

Pairwise Weights for Temporal Credit Assignment

How much credit (or blame) should an action taken in a state get for a f...

Potential-based Credit Assignment for Cooperative RL-based Testing of Autonomous Vehicles

While autonomous vehicles (AVs) may perform remarkably well in generic r...

Accelerating exploration and representation learning with offline pre-training

Sequential decision-making agents struggle with long horizon tasks, sinc...

Please sign up or login with your details

Forgot password? Click here to reset