Linear Bandits on Uniformly Convex Sets
Linear bandit algorithms yield ðŠĖ(nâ(T)) pseudo-regret bounds on compact convex action sets ðĶââ^n and two types of structural assumptions lead to better pseudo-regret bounds. When ðĶ is the simplex or an â_p ball with pâ]1,2], there exist bandits algorithms with ðŠĖ(â(nT)) pseudo-regret bounds. Here, we derive bandit algorithms for some strongly convex sets beyond â_p balls that enjoy pseudo-regret bounds of ðŠĖ(â(nT)), which answers an open question from [BCB12, 5.5.]. Interestingly, when the action set is uniformly convex but not necessarily strongly convex, we obtain pseudo-regret bounds with a dimension dependency smaller than ðŠ(â(n)). However, this comes at the expense of asymptotic rates in T varying between ðŠĖ(â(T)) and ðŠĖ(T).
READ FULL TEXT