Cluster-Based Control of Transition-Independent MDPs

by   Carmel Fiscko, et al.

This work studies the ability of a third-party influencer to control the behavior of a multi-agent system. The controller exerts actions with the goal of guiding agents to attain target joint strategies. Under mild assumptions, this can be modeled as a Markov decision problem and solved to find a control policy. This setup is refined by introducing more degrees of freedom to the control; the agents are partitioned into disjoint clusters such that each cluster can receive a unique control. Solving for a cluster-based policy through standard techniques like value iteration or policy iteration, however, takes exponentially more computation time due to the expanded action space. A solution is presented in the Clustered Value Iteration algorithm, which iteratively solves for an optimal control via a round robin approach across the clusters. CVI converges exponentially faster than standard value iteration, and can find policies that closely approximate the MDP's true optimal value. For MDPs with separable reward functions, CVI will converge to the true optimum. While an optimal clustering assignment is difficult to compute, a good clustering assignment for the agents may be found with a greedy splitting algorithm, whose associated values form a monotonic, submodular lower bound to the values of optimal clusters. Finally, these control ideas are demonstrated on simulated examples.


On the Complexity of Value Iteration

Value iteration is a fundamental algorithm for solving Markov Decision P...

Analysis of Lower Bounds for Simple Policy Iteration

Policy iteration is a family of algorithms that are used to find an opti...

Scalable Planning in Multi-Agent MDPs

Multi-agent Markov Decision Processes (MMDPs) arise in a variety of appl...

Model-Free Learning and Optimal Policy Design in Multi-Agent MDPs Under Probabilistic Agent Dropout

This work studies a multi-agent Markov decision process (MDP) that can u...

Cluster Assignment in Multi-Agent Systems : Sparsity Bounds and Fault Tolerance

We study cluster assignment in homogeneous diffusive multi-agent network...

On Convergence of Average-Reward Off-Policy Control Algorithms in Weakly-Communicating MDPs

We show two average-reward off-policy control algorithms, Differential Q...

Cluster Assignment in Multi-Agent Systems

We study cluster assignment in multi-agent networks. We consider homogen...

Please sign up or login with your details

Forgot password? Click here to reset