A Reinforcement Learning Environment for Mathematical Reasoning via Program Synthesis

07/15/2021
by   Joseph Palermo, et al.
0

We convert the DeepMind Mathematics Dataset into a reinforcement learning environment by interpreting it as a program synthesis problem. Each action taken in the environment adds an operator or an input into a discrete compute graph. Graphs which compute correct answers yield positive reward, enabling the optimization of a policy to construct compute graphs conditioned on problem statements. Baseline models are trained using Double DQN on various subsets of problem types, demonstrating the capability to learn to correctly construct graphs despite the challenges of combinatorial explosion and noisy rewards.

READ FULL TEXT
research
07/06/2018

NAPS: Natural Program Synthesis Dataset

We present a program synthesis-oriented dataset consisting of human writ...
research
01/03/2019

Imminent Collision Mitigation with Reinforcement Learning and Vision

This work examines the role of reinforcement learning in reducing the se...
research
08/28/2023

Statistically Efficient Variance Reduction with Double Policy Estimation for Off-Policy Evaluation in Sequence-Modeled Reinforcement Learning

Offline reinforcement learning aims to utilize datasets of previously ga...
research
07/13/2023

Reinforcement Learning for Syntax-Guided Synthesis

Program synthesis is the task of automatically generating code based on ...
research
02/22/2021

Program Synthesis Guided Reinforcement Learning

A key challenge for reinforcement learning is solving long-horizon plann...
research
11/03/2022

lilGym: Natural Language Visual Reasoning with Reinforcement Learning

We present lilGym, a new benchmark for language-conditioned reinforcemen...

Please sign up or login with your details

Forgot password? Click here to reset