Learn to Match with No Regret: Reinforcement Learning in Markov Matching Markets

by   Yifei Min, et al.

We study a Markov matching market involving a planner and a set of strategic agents on the two sides of the market. At each step, the agents are presented with a dynamical context, where the contexts determine the utilities. The planner controls the transition of the contexts to maximize the cumulative social welfare, while the agents aim to find a myopic stable matching at each step. Such a setting captures a range of applications including ridesharing platforms. We formalize the problem by proposing a reinforcement learning framework that integrates optimistic value iteration with maximum weight matching. The proposed algorithm addresses the coupled challenges of sequential exploration, matching stability, and function approximation. We prove that the algorithm achieves sublinear regret.


page 1

page 2

page 3

page 4


Regret, stability, and fairness in matching markets with bandit learners

We consider the two-sided matching market with bandit learners. In the s...

Timely Information from Prediction Markets

Prediction markets are powerful tools to elicit and aggregate beliefs fr...

The Complexity of Interactively Learning a Stable Matching by Trial and Error

In a stable matching setting, we consider a query model that allows for ...

Maximizing Efficiency in Dynamic Matching Markets

We study the problem of matching agents who arrive at a marketplace over...

Learning Equilibria in Matching Markets from Bandit Feedback

Large-scale, two-sided matching platforms must find market outcomes that...

Bandits in Matching Markets: Ideas and Proposals for Peer Lending

Motivated by recent applications of sequential decision making in matchi...

Double Matching Under Complementary Preferences

In this paper, we propose a new algorithm for addressing the problem of ...

Please sign up or login with your details

Forgot password? Click here to reset