Verifiable and Compositional Reinforcement Learning Systems

by   Cyrus Neary, et al.

We propose a novel framework for verifiable and compositional reinforcement learning (RL) in which a collection of RL sub-systems, each of which learns to accomplish a separate sub-task, are composed to achieve an overall task. The framework consists of a high-level model, represented as a parametric Markov decision process (pMDP) which is used to plan and to analyze compositions of sub-systems, and of the collection of low-level sub-systems themselves. By defining interfaces between the sub-systems, the framework enables automatic decompositons of task specifications, e.g., reach a target set of states with a probability of at least 0.95, into individual sub-task specifications, i.e. achieve the sub-system's exit conditions with at least some minimum probability, given that its entry conditions are met. This in turn allows for the independent training and testing of the sub-systems; if they each learn a policy satisfying the appropriate sub-task specification, then their composition is guaranteed to satisfy the overall task specification. Conversely, if the sub-task specifications cannot all be satisfied by the learned policies, we present a method, formulated as the problem of finding an optimal set of parameters in the pMDP, to automatically update the sub-task specifications to account for the observed shortcomings. The result is an iterative procedure for defining sub-task specifications, and for training the sub-systems to meet them. As an additional benefit, this procedure allows for particularly challenging or important components of an overall task to be determined automatically, and focused on, during training. Experimental results demonstrate the presented framework's novel capabilities.


page 1

page 2

page 3

page 4


Verifiable Reinforcement Learning Systems via Compositionality

We propose a framework for verifiable and compositional reinforcement le...

Compositional Reinforcement Learning from Logical Specifications

We study the problem of learning control policies for complex tasks give...

Compositional planning in Markov decision processes: Temporal abstraction meets generalized logic composition

In hierarchical planning for Markov decision processes (MDPs), temporal ...

Planning With Uncertain Specifications (PUnS)

Reward engineering is crucial to high performance in reinforcement learn...

Learning Task Specifications from Demonstrations via the Principle of Maximum Causal Entropy

In many settings (e.g., robotics) demonstrations provide a natural way t...

Compositional Specifications for ioco Testing

Model-based testing is a promising technology for black-box software and...

Please sign up or login with your details

Forgot password? Click here to reset