Learning Dynamic Mechanisms in Unknown Environments: A Reinforcement Learning Approach

by   Boxiang Lyu, et al.

Dynamic mechanism design studies how mechanism designers should allocate resources among agents in a time-varying environment. We consider the problem where the agents interact with the mechanism designer according to an unknown Markov Decision Process (MDP), where agent rewards and the mechanism designer's state evolve according to an episodic MDP with unknown reward functions and transition kernels. We focus on the online setting with linear function approximation and attempt to recover the dynamic Vickrey-Clarke-Grove (VCG) mechanism over multiple rounds of interaction. A key contribution of our work is incorporating reward-free online Reinforcement Learning (RL) to aid exploration over a rich policy space to estimate prices in the dynamic VCG mechanism. We show that the regret of our proposed method is upper bounded by 𝒊Ėƒ(T^2/3) and further devise a lower bound to show that our algorithm is efficient, incurring the same 𝒊Ėƒ(T^2 / 3) regret as the lower bound, where T is the total number of rounds. Our work establishes the regret guarantee for online RL in solving dynamic mechanism design problems without prior knowledge of the underlying model.


page 1

page 2

page 3

page 4

∙ 05/05/2022

Pessimism meets VCG: Learning Dynamic Mechanism Design via Offline Reinforcement Learning

Dynamic mechanism design has garnered significant attention from both co...
∙ 09/02/2020

Vulnerability-Aware Poisoning Mechanism for Online RL with Unknown Dynamics

Poisoning attacks, although have been studied extensively in supervised ...
∙ 06/02/2022

Incrementality Bidding via Reinforcement Learning under Mixed and Delayed Rewards

Incrementality, which is used to measure the causal effect of showing an...
∙ 06/15/2021

Fundamental Limits of Reinforcement Learning in Environment with Endogeneous and Exogeneous Uncertainty

Online reinforcement learning (RL) has been widely applied in informatio...
∙ 01/31/2022

Cooperative Online Learning in Stochastic and Adversarial MDPs

We study cooperative online learning in stochastic and adversarial Marko...
∙ 10/19/2022

A Reinforcement Learning Approach in Multi-Phase Second-Price Auction Design

We study reserve price optimization in multi-phase second price auctions...
∙ 07/10/2020

Efficient MDP Analysis for Selfish-Mining in Blockchains

A proof of work (PoW) blockchain protocol distributes rewards to its par...

Please sign up or login with your details

Forgot password? Click here to reset