Risk-Averse No-Regret Learning in Online Convex Games

by   Zifan Wang, et al.

We consider an online stochastic game with risk-averse agents whose goal is to learn optimal decisions that minimize the risk of incurring significantly high costs. Specifically, we use the Conditional Value at Risk (CVaR) as a risk measure that the agents can estimate using bandit feedback in the form of the cost values of only their selected actions. Since the distributions of the cost functions depend on the actions of all agents that are generally unobservable, they are themselves unknown and, therefore, the CVaR values of the costs are difficult to compute. To address this challenge, we propose a new online risk-averse learning algorithm that relies on one-point zeroth-order estimation of the CVaR gradients computed using CVaR values that are estimated by appropriately sampling the cost functions. We show that this algorithm achieves sub-linear regret with high probability. We also propose two variants of this algorithm that improve performance. The first variant relies on a new sampling strategy that uses samples from the previous iteration to improve the estimation accuracy of the CVaR values. The second variant employs residual feedback that uses CVaR values from the previous iteration to reduce the variance of the CVaR gradient estimates. We theoretically analyze the convergence properties of these variants and illustrate their performance on an online market problem that we model as a Cournot game.


page 1

page 2

page 3

page 4


A Zeroth-Order Momentum Method for Risk-Averse Online Convex Games

We consider risk-averse learning in repeated unknown games where the goa...

Minimizing Regret in Bandit Online Optimization in Unconstrained and Constrained Action Spaces

We consider online convex optimization with zeroth-order feedback settin...

Risk-Averse Stochastic Convex Bandit

Motivated by applications in clinical trials and finance, we study the p...

Boosting One-Point Derivative-Free Online Optimization via Residual Feedback

Zeroth-order optimization (ZO) typically relies on two-point feedback to...

Adaptive Sampling for Stochastic Risk-Averse Learning

We consider the problem of training machine learning models in a risk-av...

CFR-MIX: Solving Imperfect Information Extensive-Form Games with Combinatorial Action Space

In many real-world scenarios, a team of agents coordinate with each othe...

Lazy Queries Can Reduce Variance in Zeroth-order Optimization

A major challenge of applying zeroth-order (ZO) methods is the high quer...

Please sign up or login with your details

Forgot password? Click here to reset