AI Chat AI Image Generator AI Video Text to Speech

Configurable Markov Decision Processes

06/14/2018

∙

by Alberto Maria Metelli, et al.

∙

∙

In many real-world problems, there is the possibility to configure, to a limited extent, some environmental parameters to improve the performance of a learning agent. In this paper, we propose a novel framework, Configurable Markov Decision Processes (Conf-MDPs), to model this new type of interaction with the environment. Furthermore, we provide a new learning algorithm, Safe Policy-Model Iteration (SPMI), to jointly and adaptively optimize the policy and the environment configuration. After having introduced our approach and derived some theoretical results, we present the experimental evaluation in two explicative problems to show the benefits of the environment configurability on the performance of the learned policy.

Alberto Maria Metelli
26 publications
Mirco Mutti
10 publications
Marcello Restelli
59 publications

research

∙ 02/06/2013

Fast Value Iteration for Goal-Directed Markov Decision Processes

Planning problems where effects of actions are non-deterministic can be ...

0 Nevin Lianwen Zhang, et al. ∙

research

∙ 09/09/2019

Policy Space Identification in Configurable Environments

We study the problem of identifying the policy space of a learning agent...

0 Alberto Maria Metelli, et al. ∙

research

∙ 07/25/2022

Optimizing Empty Container Repositioning and Fleet Deployment via Configurable Semi-POMDPs

With the continuous growth of the global economy and markets, resource i...

0 Riccardo Poiani, et al. ∙

research

∙ 01/28/2022

Safe Policy Improvement Approaches on Discrete Markov Decision Processes

Safe Policy Improvement (SPI) aims at provable guarantees that a learned...

0 Philipp Scholl, et al. ∙

research

∙ 07/12/2022

Compactly Restrictable Metric Policy Optimization Problems

We study policy optimization problems for deterministic Markov decision ...

13 Victor D. Dorobantu, et al. ∙

research

∙ 03/13/2022

Policy Learning for Robust Markov Decision Process with a Mismatched Generative Model

In high-stake scenarios like medical treatment and auto-piloting, it's r...

2 Jialian Li, et al. ∙

research

∙ 12/31/2020

Robust Asymmetric Learning in POMDPs

Policies for partially observed Markov decision processes can be efficie...

0 Andrew Warrington, et al. ∙