Learning Retrospective Knowledge with Reverse Reinforcement Learning

07/09/2020
by   Shangtong Zhang, et al.
University of Oxford
25

We present a Reverse Reinforcement Learning (Reverse RL) approach for representing retrospective knowledge. General Value Functions (GVFs) have enjoyed great success in representing predictive knowledge, i.e., answering questions about possible future outcomes such as "how much fuel will be consumed in expectation if we drive from A to B?". GVFs, however, cannot answer questions like "how much fuel do we expect a car to have given it is at B at time t?". To answer this question, we need to know when that car had a full tank and how that car came to B. Since such questions emphasize the influence of possible past events on the present, we refer to their answers as retrospective knowledge. In this paper, we show how to represent retrospective knowledge with Reverse GVFs, which are trained via Reverse RL. We demonstrate empirically the utility of Reverse GVFs in both representation learning and anomaly detection.

READ FULL TEXT

page 1

page 2

page 3

page 4

05/02/2018

Characterizing Question Facets for Complex Answer Retrieval

Complex answer retrieval (CAR) is the process of retrieving answers to q...
02/23/2021

A Robotic Model of Hippocampal Reverse Replay for Reinforcement Learning

Hippocampal reverse replay is thought to contribute to learning, and par...
11/21/2018

Overcoming low-utility facets for complex answer retrieval

Many questions cannot be answered simply; their answers must include num...
11/11/2019

Learning to Order Sub-questions for Complex Question Answering

Answering complex questions involving multiple entities and relations is...
12/01/2019

AntNet: Deep Answer Understanding Network for Natural Reverse QA

This study refers to a reverse question answering(reverse QA) procedure,...
08/04/2021

Parallelized Reverse Curriculum Generation

For reinforcement learning (RL), it is challenging for an agent to maste...
09/21/2023

The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A"

We expose a surprising failure of generalization in auto-regressive larg...

Please sign up or login with your details

Forgot password? Click here to reset