We consider the problem of exploration-exploitation in communicating Markov
Decision Processes. We provide an analysis of UCRL2 with Empirical Bernstein
inequalities (UCRL2B). For any MDP with S states, A actions, Γ≤
S next states and diameter D, the regret of UCRL2B is bounded as
O(√(DΓ S A T)).