Value-Consistent Representation Learning for Data-Efficient Reinforcement Learning

by   Yang Yue, et al.

Deep reinforcement learning (RL) algorithms suffer severe performance degradation when the interaction data is scarce, which limits their real-world application. Recently, visual representation learning has been shown to be effective and promising for boosting sample efficiency in RL. These methods usually rely on contrastive learning and data augmentation to train a transition model for state prediction, which is different from how the model is used in RL–performing value-based planning. Accordingly, the learned model may not be able to align well with the environment and generate consistent value predictions, especially when the state transition is not deterministic. To address this issue, we propose a novel method, called value-consistent representation learning (VCR), to learn representations that are directly related to decision-making. More specifically, VCR trains a model to predict the future state (also referred to as the ”imagined state”) based on the current one and a sequence of actions. Instead of aligning this imagined state with a real state returned by the environment, VCR applies a Q-value head on both states and obtains two distributions of action values. Then a distance is computed and minimized to force the imagined state to produce a similar action value prediction as that by the real state. We develop two implementations of the above idea for the discrete and continuous action spaces respectively. We conduct experiments on Atari 100K and DeepMind Control Suite benchmarks to validate their effectiveness for improving sample efficiency. It has been demonstrated that our methods achieve new state-of-the-art performance for search-free RL algorithms.


page 1

page 2

page 3

page 4


TACO: Temporal Latent Action-Driven Contrastive Loss for Visual Reinforcement Learning

Despite recent progress in reinforcement learning (RL) from raw pixel da...

PlayVirtual: Augmenting Cycle-Consistent Virtual Trajectories for Reinforcement Learning

Learning good feature representations is important for deep reinforcemen...

Representation Learning for Continuous Action Spaces is Beneficial for Efficient Policy Learning

Deep reinforcement learning (DRL) breaks through the bottlenecks of trad...

Visual processing in context of reinforcement learning

Although deep reinforcement learning (RL) has recently enjoyed many succ...

Accelerating Representation Learning with View-Consistent Dynamics in Data-Efficient Reinforcement Learning

Learning informative representations from image-based observations is of...

Efficient Embedding of Semantic Similarity in Control Policies via Entangled Bisimulation

Learning generalizeable policies from visual input in the presence of vi...

For SALE: State-Action Representation Learning for Deep Reinforcement Learning

In the field of reinforcement learning (RL), representation learning is ...

Please sign up or login with your details

Forgot password? Click here to reset