We introduce the safe best-arm identification framework with linear feed...
We propose UCBMQ, Upper Confidence Bound Momentum Q-learning, a new algo...
Multi-armed bandits are widely applied in scenarios like recommender sys...
We investigate an active pure-exploration setting, that includes best-ar...
We investigate and provide new insights on the sampling rule called Top-...