A Reinforcement Learning Approach in Multi-Phase Second-Price Auction Design

by   Rui Ai, et al.

We study reserve price optimization in multi-phase second price auctions, where seller's prior actions affect the bidders' later valuations through a Markov Decision Process (MDP). Compared to the bandit setting in existing works, the setting in ours involves three challenges. First, from the seller's perspective, we need to efficiently explore the environment in the presence of potentially nontruthful bidders who aim to manipulates seller's policy. Second, we want to minimize the seller's revenue regret when the market noise distribution is unknown. Third, the seller's per-step revenue is unknown, nonlinear, and cannot even be directly observed from the environment. We propose a mechanism addressing all three challenges. To address the first challenge, we use a combination of a new technique named "buffer periods" and inspirations from Reinforcement Learning (RL) with low switching cost to limit bidders' surplus from untruthful bidding, thereby incentivizing approximately truthful bidding. The second one is tackled by a novel algorithm that removes the need for pure exploration when the market noise distribution is unknown. The third challenge is resolved by an extension of LSVI-UCB, where we use the auction's underlying structure to control the uncertainty of the revenue function. The three techniques culminate in the Contextual-LSVI-UCB-Buffer (CLUB) algorithm which achieves 𝒊Ėƒ(H^5/2√(K)) revenue regret when the market noise is known and 𝒊Ėƒ(H^3√(K)) revenue regret when the noise is unknown with no assumptions on bidders' truthfulness.


page 1

page 2

page 3

page 4

∙ 06/02/2021

Simple Economies are Almost Optimal

Consider a seller that intends to auction some item. The seller can inve...
∙ 11/14/2019

Online Second Price Auction with Semi-bandit Feedback Under the Non-Stationary Setting

In this paper, we study the non-stationary online second price auction p...
∙ 11/08/2019

Incentive-aware Contextual Pricing with Non-parametric Market Noise

We consider a dynamic pricing problem for repeated contextual second-pri...
∙ 02/25/2022

Learning Dynamic Mechanisms in Unknown Environments: A Reinforcement Learning Approach

Dynamic mechanism design studies how mechanism designers should allocate...
∙ 02/25/2020

Dynamic Incentive-aware Learning: Robust Pricing in Contextual Auctions

Motivated by pricing in ad exchange markets, we consider the problem of ...
∙ 06/07/2019

Dynamic First Price Auctions Robust to Heterogeneous Buyers

We study dynamic mechanisms for optimizing revenue in repeated auctions,...
∙ 02/16/2023

User Response in Ad Auctions: An MDP Formulation of Long-Term Revenue Optimization

We propose a new Markov Decision Process (MDP) model for ad auctions to ...

Please sign up or login with your details

Forgot password? Click here to reset