Qualitative Measurements of Policy Discrepancy for Return-based Deep Q-Network
In this paper, we focus on policy discrepancy in return-based deep Q-network (R-DQN) learning. We propose a general framework for R-DQN, with which most of the return-based reinforcement learning algorithms can be combined with DQN. We show the performance of traditional DQN can be significantly improved by introducing returnbased reinforcement learning. In order to further improve the performance of R-DQN, we present a strategy with two measurements which can qualitatively measure the policy discrepancy. Moreover, we give two bounds for these two measurements under the R-DQN framework. Algorithms with our strategy can accurately express the trace coefficient and achieve a better approximation to return. The experiments are carried out on several representative tasks from the OpenAI Gym library. Results show the algorithms with our strategy outperform the state-of-the-art R-DQN methods.
READ FULL TEXT