Mirror descent value iteration (MDVI), an abstraction of Kullback-Leible...
We study the effect of baselines in on-policy stochastic policy gradient...
In this work, we consider and analyze the sample complexity of model-fre...
Softmax policy gradient is a popular algorithm for policy optimization i...
We study the effect of stochasticity in on-policy policy optimization, a...
Classical global convergence results for first-order methods rely on uni...
Batch policy optimization considers leveraging existing data for policy
...
Model-based reinforcement learning (MBRL) can significantly improve samp...
We make three contributions toward better understanding policy gradient
...
Model-based reinforcement learning has been empirically demonstrated as ...
The scalability of submodular optimization methods is critical for their...