Perception-Prediction-Reaction Agents for Deep Reinforcement Learning

06/26/2020
by   Adam Stooke, et al.
5

We introduce a new recurrent agent architecture and associated auxiliary losses which improve reinforcement learning in partially observable tasks requiring long-term memory. We employ a temporal hierarchy, using a slow-ticking recurrent core to allow information to flow more easily over long time spans, and three fast-ticking recurrent cores with connections designed to create an information asymmetry. The reaction core incorporates new observations with input from the slow core to produce the agent's policy; the perception core accesses only short-term observations and informs the slow core; lastly, the prediction core accesses only long-term memory. An auxiliary loss regularizes policies drawn from all three cores against each other, enacting the prior that the policy should be expressible from either recent or long-term memory. We present the resulting Perception-Prediction-Reaction (PPR) agent and demonstrate its improved performance over a strong LSTM-agent baseline in DMLab-30, particularly in tasks requiring long-term memory. We further show significant improvements in Capture the Flag, an environment requiring agents to acquire a complicated mixture of skills over long time scales. In a series of ablation experiments, we probe the importance of each component of the PPR agent, establishing that the entire, novel combination is necessary for this intriguing result.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset