Fairness in Learning: Classic and Contextual Bandits

by   Matthew Joseph, et al.

We introduce the study of fairness in multi-armed bandit problems. Our fairness definition can be interpreted as demanding that given a pool of applicants (say, for college admission or mortgages), a worse applicant is never favored over a better one, despite a learning algorithm's uncertainty over the true payoffs. We prove results of two types. First, in the important special case of the classic stochastic bandits problem (i.e., in which there are no contexts), we provide a provably fair algorithm based on "chained" confidence intervals, and provide a cumulative regret bound with a cubic dependence on the number of arms. We further show that any fair algorithm must have such a dependence. When combined with regret bounds for standard non-fair algorithms such as UCB, this proves a strong separation between fair and unfair learning, which extends to the general contextual case. In the general contextual case, we prove a tight connection between fairness and the KWIK (Knows What It Knows) learning model: a KWIK algorithm for a class of functions can be transformed into a provably fair contextual bandit algorithm, and conversely any fair contextual bandit algorithm can be transformed into a KWIK learning algorithm. This tight connection allows us to provide a provably fair algorithm for the linear contextual bandit problem with a polynomial dependence on the dimension, and to show (for a different class of functions) a worst-case exponential gap in regret between fair and non-fair learning algorithms


page 1

page 2

page 3

page 4


Achieving Fairness in the Stochastic Multi-armed Bandit Problem

We study an interesting variant of the stochastic multi-armed bandit pro...

A Regret bound for Non-stationary Multi-Armed Bandits with Fairness Constraints

The multi-armed bandits' framework is the most common platform to study ...

Contextual bandits with concave rewards, and an application to fair ranking

We consider Contextual Bandits with Concave Rewards (CBCR), a multi-obje...

Achieving User-Side Fairness in Contextual Bandits

Personalized recommendation based on multi-arm bandit (MAB) algorithms h...

Online Fair Revenue Maximizing Cake Division with Non-Contiguous Pieces in Adversarial Bandits

The classic cake-cutting problem provides a model for addressing the fai...

Leveraging Good Representations in Linear Contextual Bandits

The linear contextual bandit literature is mostly focused on the design ...

Fighting Contextual Bandits with Stochastic Smoothing

We introduce a new stochastic smoothing perspective to study adversarial...

Please sign up or login with your details

Forgot password? Click here to reset