Learning to Bid in Contextual First Price Auctions

by   Ashwinkumar Badanidiyuru, et al.

In this paper, we investigate the problem about how to bid in repeated contextual first price auctions. We consider a single bidder (learner) who repeatedly bids in the first price auctions: at each time t, the learner observes a context x_t∈ℝ^d and decides the bid based on historical information and x_t. We assume a structured linear model of the maximum bid of all the others m_t = α_0· x_t + z_t, where α_0∈ℝ^d is unknown to the learner and z_t is randomly sampled from a noise distribution ℱ with log-concave density function f. We consider both binary feedback (the learner can only observe whether she wins or not) and full information feedback (the learner can observe m_t) at the end of each time t. For binary feedback, when the noise distribution ℱ is known, we propose a bidding algorithm, by using maximum likelihood estimation (MLE) method to achieve at most O(√(log(d) T)) regret. Moreover, we generalize this algorithm to the setting with binary feedback and the noise distribution is unknown but belongs to a parametrized family of distributions. For the full information feedback with unknown noise distribution, we provide an algorithm that achieves regret at most O(√(dT)). Our approach combines an estimator for log-concave density functions and then MLE method to learn the noise distribution ℱ and linear weight α_0 simultaneously. We also provide a lower bound result such that any bidding policy in a broad class must achieve regret at least Ω(√(T)), even when the learner receives the full information feedback and ℱ is known.


Repeated Bilateral Trade Against a Smoothed Adversary

We study repeated bilateral trade where an adaptive σ-smooth adversary g...

Contextual Search for General Hypothesis Classes

We study a general version of the problem of online learning under binar...

On the Theory of Reinforcement Learning with Once-per-Episode Feedback

We study a theory of reinforcement learning (RL) in which the learner re...

Tester-Learners for Halfspaces: Universal Algorithms

We give the first tester-learner for halfspaces that succeeds universall...

Corruption-Robust Contextual Search through Density Updates

We study the problem of contextual search in the adversarial noise model...

Contextual Semibandits via Supervised Learning Oracles

We study an online decision making problem where on each round a learner...

Stochastic Bandits with Context Distributions

We introduce a novel stochastic contextual bandit model, where at each s...

Please sign up or login with your details

Forgot password? Click here to reset