Reinforcement Learning Beyond Expectation

by   Bhaskar Ramasubramanian, et al.

The inputs and preferences of human users are important considerations in situations where these users interact with autonomous cyber or cyber-physical systems. In these scenarios, one is often interested in aligning behaviors of the system with the preferences of one or more human users. Cumulative prospect theory (CPT) is a paradigm that has been empirically shown to model a tendency of humans to view gains and losses differently. In this paper, we consider a setting where an autonomous agent has to learn behaviors in an unknown environment. In traditional reinforcement learning, these behaviors are learned through repeated interactions with the environment by optimizing an expected utility. In order to endow the agent with the ability to closely mimic the behavior of human users, we optimize a CPT-based cost. We introduce the notion of the CPT-value of an action taken in a state, and establish the convergence of an iterative dynamic programming-based approach to estimate this quantity. We develop two algorithms to enable agents to learn policies to optimize the CPT-vale, and evaluate these algorithms in environments where a target state has to be reached while avoiding obstacles. We demonstrate that behaviors of the agent learned using these algorithms are better aligned with that of a human user who might be placed in the same environment, and is significantly improved over a baseline that optimizes an expected utility.


Privacy-Preserving Reinforcement Learning Beyond Expectation

Cyber and cyber-physical systems equipped with machine learning algorith...

Towards Preference Learning for Autonomous Ground Robot Navigation Tasks

We are interested in the design of autonomous robot behaviors that learn...

Risk-Aware Distributed Multi-Agent Reinforcement Learning

Autonomous cyber and cyber-physical systems need to perform decision-mak...

Reinforcement Learning with Iterative Reasoning for Merging in Dense Traffic

Maneuvering in dense traffic is a challenging task for autonomous vehicl...

Modeling Cyber-Physical Human Systems via an Interplay Between Reinforcement Learning and Game Theory

Predicting the outcomes of cyber-physical systems with multiple human in...

Robustness and Adaptability of Reinforcement Learning based Cooperative Autonomous Driving in Mixed-autonomy Traffic

Building autonomous vehicles (AVs) is a complex problem, but enabling th...

Reinforced Imitative Graph Learning for Mobile User Profiling

Mobile user profiling refers to the efforts of extracting users' charact...

Please sign up or login with your details

Forgot password? Click here to reset