Full Gradient Deep Reinforcement Learning for Average-Reward Criterion

04/07/2023
by   Tejas Pagare, et al.
0

We extend the provably convergent Full Gradient DQN algorithm for discounted reward Markov decision processes from Avrachenkov et al. (2021) to average reward problems. We experimentally compare widely used RVI Q-Learning with recently proposed Differential Q-Learning in the neural function approximation setting with Full Gradient DQN and DQN. We also extend this to learn Whittle indices for Markovian restless multi-armed bandits. We observe a better convergence rate of the proposed Full Gradient variant across different tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/13/2020

Active Reinforcement Learning: Observing Rewards at a Cost

Active reinforcement learning (ARL) is a variant on reinforcement learni...
research
07/23/2020

Learning Infinite-horizon Average-reward MDPs with Linear Function Approximation

We develop several new algorithms for learning Markov Decision Processes...
research
06/14/2021

On-Policy Deep Reinforcement Learning for the Average-Reward Criterion

We develop theory and algorithms for average-reward on-policy Reinforcem...
research
10/05/2021

NeurWIN: Neural Whittle Index Network For Restless Bandits Via Deep RL

Whittle index policy is a powerful tool to obtain asymptotically optimal...
research
10/26/2021

Average-Reward Learning and Planning with Options

We extend the options framework for temporal abstraction in reinforcemen...
research
04/20/2015

Optimal Nudging: Solving Average-Reward Semi-Markov Decision Processes as a Minimal Sequence of Cumulative Tasks

This paper describes a novel method to solve average-reward semi-Markov ...
research
04/29/2020

Whittle index based Q-learning for restless bandits with average reward

A novel reinforcement learning algorithm is introduced for multiarmed re...

Please sign up or login with your details

Forgot password? Click here to reset