We consider a non-stationary Bandits with Knapsack problem. The outcome
...
We study the Pareto frontier of two archetypal objectives in stochastic
...
We consider a best arm identification (BAI) problem for stochastic bandi...
We consider un-discounted reinforcement learning (RL) in Markov decision...
We design and analyze CascadeBAI, an algorithm for finding the best set ...
We propose algorithms with state-of-the-art dynamic regret bounds for
un...
We consider an agent who is involved in a Markov decision process and
re...
We introduce general data-driven decision-making algorithms that achieve...
We study a general problem of allocating limited resources to heterogene...
We introduce algorithms that achieve state-of-the-art dynamic regret
bou...
We design and analyze TS-Cascade, a Thompson sampling algorithm for the
...