Direct Behavior Specification via Constrained Reinforcement Learning

12/22/2021
by   Julien Roy, et al.
0

The standard formulation of Reinforcement Learning lacks a practical way of specifying what are admissible and forbidden behaviors. Most often, practitioners go about the task of behavior specification by manually engineering the reward function, a counter-intuitive process that requires several iterations and is prone to reward hacking by the agent. In this work, we argue that constrained RL, which has almost exclusively been used for safe RL, also has the potential to significantly reduce the amount of work spent for reward specification in applied Reinforcement Learning projects. To this end, we propose to specify behavioral preferences in the CMDP framework and to use Lagrangian methods, which seek to solve a min-max problem between the agent's policy and the Lagrangian multipliers, to automatically weigh each of the behavioral constraints. Specifically, we investigate how CMDPs can be adapted in order to solve goal-based tasks while adhering to a set of behavioral constraints and propose modifications to the SAC-Lagrangian algorithm to handle the challenging case of several constraints. We evaluate this framework on a set of continuous control tasks relevant to the application of Reinforcement Learning for NPC design in video games.

READ FULL TEXT

page 2

page 3

page 4

research
07/21/2021

Reinforcement Learning Agent Training with Goals for Real World Tasks

Reinforcement Learning (RL) is a promising approach for solving various ...
research
08/21/2020

A Composable Specification Language for Reinforcement Learning Tasks

Reinforcement learning is a promising approach for learning control poli...
research
09/27/2017

A Policy Search Method For Temporal Logic Specified Reinforcement Learning Tasks

Reward engineering is an important aspect of reinforcement learning. Whe...
research
10/10/2021

Braxlines: Fast and Interactive Toolkit for RL-driven Behavior Engineering beyond Reward Maximization

The goal of continuous control is to synthesize desired behaviors. In re...
research
07/08/2020

Responsive Safety in Reinforcement Learning by PID Lagrangian Methods

Lagrangian methods are widely used algorithms for constrained optimizati...
research
06/09/2020

Policy-focused Agent-based Modeling using RL Behavioral Models

Agent-based Models (ABMs) are valuable tools for policy analysis. ABMs h...
research
02/12/2019

Value constrained model-free continuous control

The naive application of Reinforcement Learning algorithms to continuous...

Please sign up or login with your details

Forgot password? Click here to reset