Direct Behavior Specification via Constrained Reinforcement Learning

by   Julien Roy, et al.
Montréal Institute of Learning Algorithms

The standard formulation of Reinforcement Learning lacks a practical way of specifying what are admissible and forbidden behaviors. Most often, practitioners go about the task of behavior specification by manually engineering the reward function, a counter-intuitive process that requires several iterations and is prone to reward hacking by the agent. In this work, we argue that constrained RL, which has almost exclusively been used for safe RL, also has the potential to significantly reduce the amount of work spent for reward specification in applied Reinforcement Learning projects. To this end, we propose to specify behavioral preferences in the CMDP framework and to use Lagrangian methods, which seek to solve a min-max problem between the agent's policy and the Lagrangian multipliers, to automatically weigh each of the behavioral constraints. Specifically, we investigate how CMDPs can be adapted in order to solve goal-based tasks while adhering to a set of behavioral constraints and propose modifications to the SAC-Lagrangian algorithm to handle the challenging case of several constraints. We evaluate this framework on a set of continuous control tasks relevant to the application of Reinforcement Learning for NPC design in video games.


page 2

page 3

page 4


Reinforcement Learning Agent Training with Goals for Real World Tasks

Reinforcement Learning (RL) is a promising approach for solving various ...

A Composable Specification Language for Reinforcement Learning Tasks

Reinforcement learning is a promising approach for learning control poli...

A Policy Search Method For Temporal Logic Specified Reinforcement Learning Tasks

Reward engineering is an important aspect of reinforcement learning. Whe...

Braxlines: Fast and Interactive Toolkit for RL-driven Behavior Engineering beyond Reward Maximization

The goal of continuous control is to synthesize desired behaviors. In re...

Responsive Safety in Reinforcement Learning by PID Lagrangian Methods

Lagrangian methods are widely used algorithms for constrained optimizati...

Policy-focused Agent-based Modeling using RL Behavioral Models

Agent-based Models (ABMs) are valuable tools for policy analysis. ABMs h...

Value constrained model-free continuous control

The naive application of Reinforcement Learning algorithms to continuous...

Please sign up or login with your details

Forgot password? Click here to reset