Scaling Pareto-Efficient Decision Making Via Offline Multi-Objective RL

04/30/2023
by   Baiting Zhu, et al.
0

The goal of multi-objective reinforcement learning (MORL) is to learn policies that simultaneously optimize multiple competing objectives. In practice, an agent's preferences over the objectives may not be known apriori, and hence, we require policies that can generalize to arbitrary preferences at test time. In this work, we propose a new data-driven setup for offline MORL, where we wish to learn a preference-agnostic policy agent using only a finite dataset of offline demonstrations of other agents and their preferences. The key contributions of this work are two-fold. First, we introduce D4MORL, (D)atasets for MORL that are specifically designed for offline settings. It contains 1.8 million annotated demonstrations obtained by rolling out reference policies that optimize for randomly sampled preferences on 6 MuJoCo environments with 2-3 objectives each. Second, we propose Pareto-Efficient Decision Agents (PEDA), a family of offline MORL algorithms that builds and extends Decision Transformers via a novel preference-and-return-conditioned policy. Empirically, we show that PEDA closely approximates the behavioral policy on the D4MORL benchmark and provides an excellent approximation of the Pareto-front with appropriate conditioning, as measured by the hypervolume and sparsity metrics.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/21/2019

A Generalized Algorithm for Multi-Objective Reinforcement Learning and Policy Adaptation

We introduce a new algorithm for multi-objective reinforcement learning ...
research
11/08/2018

Meta-Learning for Multi-objective Reinforcement Learning

Multi-objective reinforcement learning (MORL) is the generalization of s...
research
04/27/2023

Preference Inference from Demonstration in Multi-objective Multi-agent Decision Making

It is challenging to quantify numerical preferences for different object...
research
12/30/2021

MORAL: Aligning AI with Human Norms through Multi-Objective Reinforced Active Learning

Inferring reward functions from demonstrations and pairwise preferences ...
research
01/18/2023

Sample-Efficient Multi-Objective Learning via Generalized Policy Improvement Prioritization

Multi-objective reinforcement learning (MORL) algorithms tackle sequenti...
research
02/08/2023

Sample-efficient Multi-objective Molecular Optimization with GFlowNets

Many crucial scientific problems involve designing novel molecules with ...
research
05/09/2023

Distributional Multi-Objective Decision Making

For effective decision support in scenarios with conflicting objectives,...

Please sign up or login with your details

Forgot password? Click here to reset