Recently, Truncated Quantile Critics (TQC), using distributional
represe...
We develop theory and algorithms for average-reward on-policy Reinforcem...
In reinforcement learning, an agent attempts to learn high-performing
be...
In recent years, deep reinforcement learning has been shown to be adept ...
We propose a new sample-efficient methodology, called Supervised Policy
...