Regret Minimization with Dynamic Benchmarks in Repeated Games

by   Ludovico Crippa, et al.

In repeated games, strategies are often evaluated by their ability to guarantee the performance of the single best action that is selected in hindsight (a property referred to as Hannan consistency, or no-regret). However, the effectiveness of the single best action as a yardstick to evaluate strategies is limited, as any static action may perform poorly in common dynamic settings. We propose the notion of dynamic benchmark consistency, which requires a strategy to asymptotically guarantee the performance of the best dynamic sequence of actions selected in hindsight subject to a constraint on the number of action changes the corresponding dynamic benchmark admits. We show that dynamic benchmark consistent strategies exist if and only if the number of changes in the benchmark scales sublinearly with the horizon length. Further, our main result establishes that the set of empirical joint distributions of play that may emerge, when all players deploy such strategies, asymptotically coincides with the set of Hannan equilibria (also referred to as coarse correlated equilibria) of the stage game. This general characterization allows one to leverage analyses developed for frameworks that consider static benchmarks, which we demonstrate by bounding the social efficiency of the possible outcomes in our setting. Together, our results imply that dynamic benchmark consistent strategies introduce the following Pareto-type improvement over no-regret strategies: They enable stronger individual guarantees against arbitrary strategies of the other players, while maintaining the same worst-case guarantees on the social welfare, when all players adopt these strategies.


page 1

page 2

page 3

page 4


Learning in Games with Cumulative Prospect Theoretic Preferences

We consider repeated games where players behave according to cumulative ...

Regret Minimization in Repeated Games: A Set-Valued Dynamic Programming Approach

The regret-minimization paradigm has emerged as an effective technique f...

Equilibria in Repeated Games with Countably Many Players and Tail-Measurable Payoffs

We prove that every repeated game with countably many players, finite ac...

Learning in time-varying games

In this paper, we examine the long-term behavior of regret-minimizing ag...

Online Optimization : Competing with Dynamic Comparators

Recent literature on online learning has focused on developing adaptive ...

CFR-MIX: Solving Imperfect Information Extensive-Form Games with Combinatorial Action Space

In many real-world scenarios, a team of agents coordinate with each othe...

Optimistic Dynamic Regret Bounds

Online Learning (OL) algorithms have originally been developed to guaran...

Please sign up or login with your details

Forgot password? Click here to reset