Differential Privacy for Multi-armed Bandits: What Is It and What Is Its Cost?
We introduce a number of privacy definitions for the multi-armed bandit problem, based on differential privacy. We relate them through a unifying graphical model representation and connect them to existing definitions. We then derive and contrast lower bounds on the regret of bandit algorithms satisfying these definitions. We show that for all of them, the learner's regret is increased by a multiplicative factor dependent on the privacy level ϵ, but that the dependency is weaker when we do not require local differential privacy for the rewards.READ FULL TEXT