Exploration remains a key challenge in deep reinforcement learning (RL)....
We study the problem of planning under model uncertainty in an online
me...
In this short note we derive a relationship between the Bregman divergen...
We consider online sequential decision problems where an agent must bala...
Posterior predictive distributions quantify uncertainties ignored by poi...
Maximising a cumulative reward function that is Markov and stationary, i...
Mixed Integer Programming (MIP) solvers rely on an array of sophisticate...
Policy gradient methods are among the most effective methods for large-s...
We study a version of the classical zero-sum matrix game with unknown pa...
Reinforcement learning (RL) combines a control problem with statistical
...
Prior work on neural network verification has focused on specifications ...
While deep learning has led to remarkable results on a number of challen...
We propose a family of optimization methods that achieve linear converge...
We consider the exploration-exploitation trade-off in reinforcement lear...
This paper proposes a new algorithmic framework,predictor-verifier
train...
This paper investigates recently proposed approaches for defending again...
We consider the exploration/exploitation problem in reinforcement learni...
Policy gradient is an efficient technique for improving a policy in a
re...