Modern deep-learning systems are specialized to problem settings in whic...
The policy gradient theorem gives a convenient form of the policy gradie...
We present a scalable and effective exploration strategy based on Thomps...
Continuous-time reinforcement learning tasks commonly use discrete steps...
Modern representation learning methods may fail to adapt quickly under
n...
In recent years, by leveraging more data, computation, and diverse tasks...
In classic reinforcement learning algorithms, agents make decisions at
d...
Second-order optimization uses curvature information about the objective...
Real-time learning is crucial for robotic agents adapting to ever-changi...
Artificial neural networks are promising as general function approximato...
An oft-ignored challenge of real-world reinforcement learning is that th...
The policy gradient theorem (Sutton et al., 2000) prescribes the usage o...
Policy gradient (PG) estimators for softmax policies are ineffective wit...
The Backprop algorithm for learning in neural networks utilizes two
mech...
Approximate Policy Iteration (API) algorithms alternate between (approxi...
Designing adaptable control laws that can transfer between different rob...
Policy gradient methods estimate the gradient of a policy objective sole...
Reinforcement learning algorithms rely on exploration to discover new
be...
Through many recent successes in simulation, model-free reinforcement
le...
Reinforcement learning is a promising approach to developing hard-to-eng...
The temporal-difference methods TD(λ) and Sarsa(λ) form a
core part of m...
The true online TD(λ) algorithm has recently been proposed (van
Seijen a...