Lifelong Policy Gradient Learning of Factored Policies for Faster Training Without Forgetting

07/14/2020
by   Jorge A. Mendez, et al.
15

Policy gradient methods have shown success in learning control policies for high-dimensional dynamical systems. Their biggest downside is the amount of exploration they require before yielding high-performing policies. In a lifelong learning setting, in which an agent is faced with multiple consecutive tasks over its lifetime, reusing information from previously seen tasks can substantially accelerate the learning of new tasks. We provide a novel method for lifelong policy gradient learning that trains lifelong function approximators directly via policy gradients, allowing the agent to benefit from accumulated knowledge throughout the entire training process. We show empirically that our algorithm learns faster and converges to better policies than single-task and lifelong learning baselines, and completely avoids catastrophic forgetting on a variety of challenging domains.

READ FULL TEXT
research
02/10/2019

Diverse Exploration via Conjugate Policies for Policy Gradient Methods

We address the challenge of effective exploration while maintaining good...
research
01/24/2019

Sample Complexity of Estimating the Policy Gradient for Nearly Deterministic Dynamical Systems

Reinforcement learning is a promising approach to learning robot control...
research
09/28/2022

SoftTreeMax: Policy Gradient with Tree Search

Policy-gradient methods are widely used for learning control policies. T...
research
03/13/2018

Learning to Explore with Meta-Policy Gradient

The performance of off-policy learning, including deep Q-learning and de...
research
09/03/2018

Emergence of Communication in an Interactive World with Consistent Speakers

Training agents to communicate with one another given task-based supervi...
research
05/14/2019

Learning Policies from Self-Play with Policy Gradients and MCTS Value Estimates

In recent years, state-of-the-art game-playing agents often involve poli...
research
09/26/2019

V-MPO: On-Policy Maximum a Posteriori Policy Optimization for Discrete and Continuous Control

Some of the most successful applications of deep reinforcement learning ...

Please sign up or login with your details

Forgot password? Click here to reset