What Should I Know? Using Meta-gradient Descent for Predictive Feature Discovery in a Single Stream of Experience

by   Alexandra Kearney, et al.

In computational reinforcement learning, a growing body of work seeks to construct an agent's perception of the world through predictions of future sensations; predictions about environment observations are used as additional input features to enable better goal-directed decision-making. An open challenge in this line of work is determining from the infinitely many predictions that the agent could possibly make which predictions might best support decision-making. This challenge is especially apparent in continual learning problems where a single stream of experience is available to a singular agent. As a primary contribution, we introduce a meta-gradient descent process by which an agent learns 1) what predictions to make, 2) the estimates for its chosen predictions, and 3) how to use those estimates to generate policies that maximize future reward – all during a single ongoing process of continual learning. In this manuscript we consider predictions expressed as General Value Functions: temporally extended estimates of the accumulation of a future signal. We demonstrate that through interaction with the environment an agent can independently select predictions that resolve partial-observability, resulting in performance similar to expertly specified GVFs. By learning, rather than manually specifying these predictions, we enable the agent to identify useful predictions in a self-supervised manner, taking a step towards truly autonomous systems.


Finding Useful Predictions by Meta-gradient Descent to Improve Decision-making

In computational reinforcement learning, a growing body of work seeks to...

Learning Agent State Online with Recurrent Generate-and-Test

Learning continually and online from a continuous stream of data is chal...

Continual Backprop: Stochastic Gradient Descent with Persistent Randomness

The Backprop algorithm for learning in neural networks utilizes two mech...

What's a Good Prediction? Issues in Evaluating General Value Functions Through Error

Constructing and maintaining knowledge of the world is a central problem...

How to Reuse and Compose Knowledge for a Lifetime of Tasks: A Survey on Continual Learning and Functional Composition

A major goal of artificial intelligence (AI) is to create an agent capab...

When is a Prediction Knowledge?

Within Reinforcement Learning, there is a growing collection of research...

Identification of Invariant Sensorimotor Structures as a Prerequisite for the Discovery of Objects

Perceiving the surrounding environment in terms of objects is useful for...

Please sign up or login with your details

Forgot password? Click here to reset