Online Learning in Stackelberg Games with an Omniscient Follower

by   Geng Zhao, et al.
berkeley college

We study the problem of online learning in a two-player decentralized cooperative Stackelberg game. In each round, the leader first takes an action, followed by the follower who takes their action after observing the leader's move. The goal of the leader is to learn to minimize the cumulative regret based on the history of interactions. Differing from the traditional formulation of repeated Stackelberg games, we assume the follower is omniscient, with full knowledge of the true reward, and that they always best-respond to the leader's actions. We analyze the sample complexity of regret minimization in this repeated Stackelberg game. We show that depending on the reward structure, the existence of the omniscient follower may change the sample complexity drastically, from constant to exponential, even for linear cooperative Stackelberg games. This poses unique challenges for the learning process of the leader and the subsequent regret analysis.


page 1

page 2

page 3

page 4


Last Round Convergence and No-Instant Regret in Repeated Games with Asymmetric Information

This paper considers repeated games in which one player has more informa...

Refined approachability algorithms and application to regret minimization with global costs

Blackwell's approachability is a framework where two players, the Decisi...

Active Inverse Learning in Stackelberg Trajectory Games

Game-theoretic inverse learning is the problem of inferring the players'...

Be a Leader or Become a Follower: The Strategy to Commit to with Multiple Leaders (Extended Version)

We study the problem of computing correlated strategies to commit to in ...

Efficient Stackelberg Strategies for Finitely Repeated Games

We study the problem of efficiently computing optimal strategies in asym...

No-Regret Learning in Dynamic Stackelberg Games

In a Stackelberg game, a leader commits to a randomized strategy, and a ...

Approachability in unknown games: Online learning meets multi-objective optimization

In the standard setting of approachability there are two players and a t...

Please sign up or login with your details

Forgot password? Click here to reset