Safe Reinforcement Learning with Natural Language Constraints

by   Tsung-Yen Yang, et al.

In this paper, we tackle the problem of learning control policies for tasks when provided with constraints in natural language. In contrast to instruction following, language here is used not to specify goals, but rather to describe situations that an agent must avoid during its exploration of the environment. Specifying constraints in natural language also differs from the predominant paradigm in safe reinforcement learning, where safety criteria are enforced by hand-defined cost functions. While natural language allows for easy and flexible specification of safety constraints and budget limitations, its ambiguous nature presents a challenge when mapping these specifications into representations that can be used by techniques for safe reinforcement learning. To address this, we develop a model that contains two components: (1) a constraint interpreter to encode natural language constraints into vector representations capturing spatial and temporal information on forbidden states, and (2) a policy network that uses these representations to output a policy with minimal constraint violations. Our model is end-to-end differentiable and we train it using a recently proposed algorithm for constrained policy optimization. To empirically demonstrate the effectiveness of our approach, we create a new benchmark task for autonomous navigation with crowd-sourced free-form text specifying three different types of constraints. Our method outperforms several baselines by achieving 6-7 times higher returns and 76 fewer constraint violations on average. Dataset and code to reproduce our experiments are available at


Lyapunov-based Safe Policy Optimization for Continuous Control

We study continuous action reinforcement learning problems in which it i...

Enhancing Safe Exploration Using Safety State Augmentation

Safe exploration is a challenging and important problem in model-free re...

Evaluation of Safety Constraints in Autonomous Navigation with Deep Reinforcement Learning

While reinforcement learning algorithms have had great success in the fi...

Safe Reinforcement Learning of Control-Affine Systems with Vertex Networks

This paper focuses on finding reinforcement learning policies for contro...

Co-learning Planning and Control Policies Using Differentiable Formal Task Constraints

This paper presents a hierarchical reinforcement learning algorithm cons...

Policy Gradients for Probabilistic Constrained Reinforcement Learning

This paper considers the problem of learning safe policies in the contex...

Policy Optimization with Linear Temporal Logic Constraints

We study the problem of policy optimization (PO) with linear temporal lo...

Please sign up or login with your details

Forgot password? Click here to reset