Non-monotonic Resource Utilization in the Bandits with Knapsacks Problem

09/24/2022
by   Raunak Kumar, et al.
6

Bandits with knapsacks (BwK) is an influential model of sequential decision-making under uncertainty that incorporates resource consumption constraints. In each round, the decision-maker observes an outcome consisting of a reward and a vector of nonnegative resource consumptions, and the budget of each resource is decremented by its consumption. In this paper we introduce a natural generalization of the stochastic BwK problem that allows non-monotonic resource utilization. In each round, the decision-maker observes an outcome consisting of a reward and a vector of resource drifts that can be positive, negative or zero, and the budget of each resource is incremented by its drift. Our main result is a Markov decision process (MDP) policy that has constant regret against a linear programming (LP) relaxation when the decision-maker knows the true outcome distributions. We build upon this to develop a learning algorithm that has logarithmic regret against the same LP relaxation when the decision-maker does not know the true outcome distributions. We also present a reduction from BwK to our model that shows our regret bound matches existing results.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/14/2023

Bandits with Replenishable Knapsacks: the Best of both Worlds

The bandits with knapsack (BwK) framework models online decision-making ...
research
05/18/2023

Online Resource Allocation in Episodic Markov Decision Processes

This paper studies a long-term resource allocation problem over multiple...
research
02/01/2020

Advances in Bandits with Knapsacks

"Bandits with Knapsacks" () is a general model for multi-armed bandits u...
research
06/10/2015

An efficient algorithm for contextual bandits with knapsacks, and an extension to concave objectives

We consider a contextual version of multi-armed bandit problem with glob...
research
07/14/2020

Optimal Learning for Structured Bandits

We study structured multi-armed bandits, which is the problem of online ...
research
02/28/2022

Online Learning with Knapsacks: the Best of Both Worlds

We study online learning problems in which a decision maker wants to max...
research
07/01/2021

Markov Decision Process modeled with Bandits for Sequential Decision Making in Linear-flow

In membership/subscriber acquisition and retention, we sometimes need to...

Please sign up or login with your details

Forgot password? Click here to reset