A Reduction from Reinforcement Learning to No-Regret Online Learning

11/14/2019
by   Ching-An Cheng, et al.
0

We present a reduction from reinforcement learning (RL) to no-regret online learning based on the saddle-point formulation of RL, by which "any" online algorithm with sublinear regret can generate policies with provable performance guarantees. This new perspective decouples the RL problem into two parts: regret minimization and function approximation. The first part admits a standard online-learning analysis, and the second part can be quantified independently of the learning algorithm. Therefore, the proposed reduction can be used as a tool to systematically design new RL algorithms. We demonstrate this idea by devising a simple RL algorithm based on mirror descent and the generative-model oracle. For any γ-discounted tabular RL problem, with probability at least 1-δ, it learns an ϵ-optimal policy using at most Õ(|S||A|log(1/δ)/(1-γ)^4ϵ^2) samples. Furthermore, this algorithm admits a direct extension to linearly parameterized function approximators for large-scale applications, with computation and sample complexities independent of |S|,|A|, though at the cost of potential approximation bias.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/06/2023

Provable Reset-free Reinforcement Learning by No-Regret Reduction

Real-world reinforcement learning (RL) is often severely limited since t...
research
11/19/2020

Online Model Selection for Reinforcement Learning with Function Approximation

Deep reinforcement learning has achieved impressive successes yet often ...
research
04/17/2017

Effective Warm Start for the Online Actor-Critic Reinforcement Learning based mHealth Intervention

Online reinforcement learning (RL) is increasingly popular for the perso...
research
06/21/2019

Revised Progressive-Hedging-Algorithm Based Two-layer Solution Scheme for Bayesian Reinforcement Learning

Stochastic control with both inherent random system noise and lack of kn...
research
03/05/2012

Agnostic System Identification for Model-Based Reinforcement Learning

A fundamental problem in control is to learn a model of a system from ob...
research
06/09/2021

ChaCha for Online AutoML

We propose the ChaCha (Champion-Challengers) algorithm for making an onl...
research
09/20/2022

A Joint Imitation-Reinforcement Learning Framework for Reduced Baseline Regret

In various control task domains, existing controllers provide a baseline...

Please sign up or login with your details

Forgot password? Click here to reset