Approximating Martingale Process for Variance Reduction in Deep Reinforcement Learning with Large State Space

11/29/2022
by   Charlie Ruan, et al.
0

Approximating Martingale Process (AMP) is proven to be effective for variance reduction in reinforcement learning (RL) in specific cases such as Multiclass Queueing Networks. However, in the already proven cases, the state space is relatively small and all possible state transitions can be iterated through. In this paper, we consider systems in which state space is large and have uncertainties when considering state transitions, thus making AMP a generalized variance-reduction method in RL. Specifically, we will investigate the application of AMP in ride-hailing systems like Uber, where Proximal Policy Optimization (PPO) is incorporated to optimize the policy of matching drivers and customers.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/27/2017

Divide-and-Conquer Reinforcement Learning

Standard model-free deep reinforcement learning (RL) algorithms sample a...
research
06/02/2023

Reinforcement Learning with General Utilities: Simpler Variance Reduction and Large State-Action Space

We consider the reinforcement learning (RL) problem with general utiliti...
research
05/19/2021

Improved Exploring Starts by Kernel Density Estimation-Based State-Space Coverage Acceleration in Reinforcement Learning

Reinforcement learning (RL) is currently a popular research topic in con...
research
03/30/2020

Deep reinforcement learning for large-scale epidemic control

Epidemics of infectious diseases are an important threat to public healt...
research
12/24/2021

On the Unreasonable Efficiency of State Space Clustering in Personalization Tasks

In this effort we consider a reinforcement learning (RL) technique for s...
research
05/20/2017

Batch Reinforcement Learning on the Industrial Benchmark: First Experiences

The Particle Swarm Optimization Policy (PSO-P) has been recently introdu...
research
06/08/2020

Stable Reinforcement Learning with Unbounded State Space

We consider the problem of reinforcement learning (RL) with unbounded st...

Please sign up or login with your details

Forgot password? Click here to reset