KoGuN: Accelerating Deep Reinforcement Learning via Integrating Human Suboptimal Knowledge

by   Peng Zhang, et al.

Reinforcement learning agents usually learn from scratch, which requires a large number of interactions with the environment. This is quite different from the learning process of human. When faced with a new task, human naturally have the common sense and use the prior knowledge to derive an initial policy and guide the learning process afterwards. Although the prior knowledge may be not fully applicable to the new task, the learning process is significantly sped up since the initial policy ensures a quick-start of learning and intermediate guidance allows to avoid unnecessary exploration. Taking this inspiration, we propose knowledge guided policy network (KoGuN), a novel framework that combines human prior suboptimal knowledge with reinforcement learning. Our framework consists of a fuzzy rule controller to represent human knowledge and a refine module to fine-tune suboptimal prior knowledge. The proposed framework is end-to-end and can be combined with existing policy-based reinforcement learning algorithm. We conduct experiments on both discrete and continuous control tasks. The empirical results show that our approach, which combines human suboptimal knowledge and RL, achieves significant improvement on learning efficiency of flat RL algorithms, even with very low-performance human prior knowledge.


page 1

page 2

page 3

page 4


Bayesian Transfer Reinforcement Learning with Prior Knowledge Rules

We propose a probabilistic framework to directly insert prior knowledge ...

Transferring Domain Knowledge with an Adviser in Continuous Tasks

Recent advances in Reinforcement Learning (RL) have surpassed human-leve...

A new soft computing method for integration of expert's knowledge in reinforcement learn-ing problems

This paper proposes a novel fuzzy action selection method to leverage hu...

Deep Reinforcement Learning Framework for Thoracic Diseases Classification via Prior Knowledge Guidance

The chest X-ray is often utilized for diagnosing common thoracic disease...

Accelerating Reinforcement Learning with Suboptimal Guidance

Reinforcement Learning in domains with sparse rewards is a difficult pro...

AutoGPart: Intermediate Supervision Search for Generalizable 3D Part Segmentation

Training a generalizable 3D part segmentation network is quite challengi...

Moment Matching Training for Neural Machine Translation: A Preliminary Study

In previous works, neural sequence models have been shown to improve sig...

Please sign up or login with your details

Forgot password? Click here to reset