Reinforcement Learning With Reward Machines in Stochastic Games

05/27/2023
by   Jueming Hu, et al.
0

We investigate multi-agent reinforcement learning for stochastic games with complex tasks, where the reward functions are non-Markovian. We utilize reward machines to incorporate high-level knowledge of complex tasks. We develop an algorithm called Q-learning with reward machines for stochastic games (QRM-SG), to learn the best-response strategy at Nash equilibrium for each agent. In QRM-SG, we define the Q-function at a Nash equilibrium in augmented state space. The augmented state space integrates the state of the stochastic game and the state of reward machines. Each agent learns the Q-functions of all agents in the system. We prove that Q-functions learned in QRM-SG converge to the Q-functions at a Nash equilibrium if the stage game at each time step during learning has a global optimum point or a saddle point, and the agents update Q-functions based on the best-response strategy at this point. We use the Lemke-Howson method to derive the best-response strategy given current Q-functions. The three case studies show that QRM-SG can learn the best-response strategies effectively. QRM-SG learns the best-response strategies after around 7500 episodes in Case Study I, 1000 episodes in Case Study II, and 1500 episodes in Case Study III, while baseline methods such as Nash Q-learning and MADDPG fail to converge to the Nash equilibrium in all three case studies.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/07/2018

Competitive Multi-agent Inverse Reinforcement Learning with Sub-optimal Demonstrations

This paper considers the problem of inverse reinforcement learning in ze...
research
07/18/2019

Optimal Bi-level Lottery Design for Multi-agent Systems

We consider a bi-level lottery where a social planner at the high level ...
research
11/29/2020

Minimax Sample Complexity for Turn-based Stochastic Game

The empirical success of Multi-agent reinforcement learning is encouragi...
research
03/24/2021

The Value of Communication and Cooperation in a Two-Server Service System

In 2015, Guglielmi and Badia discussed optimal strategies in a particula...
research
03/16/2020

Value Variance Minimization for Learning Approximate Equilibrium in Aggregation Systems

For effective matching of resources (e.g., taxis, food, bikes, shopping ...
research
04/09/2023

Higher-Order Uncoupled Dynamics Do Not Lead to Nash Equilibrium – Except When They Do

The framework of multi-agent learning explores the dynamics of how indiv...
research
02/01/2023

Combining Tree-Search, Generative Models, and Nash Bargaining Concepts in Game-Theoretic Reinforcement Learning

Multiagent reinforcement learning (MARL) has benefited significantly fro...

Please sign up or login with your details

Forgot password? Click here to reset