Finding the Bandit in a Graph: Sequential Search-and-Stop

by   Pierre Perrault, et al.

We consider the problem where an agent wants to find a hidden object that is randomly located in some vertex of a directed acyclic graph (DAG) according to a fixed but possibly unknown distribution. The agent can only examine vertices whose in-neighbors have already been examined. In scheduling theory, this problem is denoted by 1|prec|∑ w_jC_j. However, in this paper, we address learning setting where we allow the agent to stop before having found the object and restart searching on a new independent instance of the same problem. The goal is to maximize the total number of hidden objects found under a time constraint. The agent can thus skip an instance after realizing that it would spend too much time on it. Our contributions are both to the search theory and multi-armed bandits. If the distribution is known, we provide a quasi-optimal greedy strategy with the help of known computationally efficient algorithms for solving 1|prec|∑ w_jC_j under some assumption on the DAG. If the distribution is unknown, we show how to sequentially learn it and, at the same time, act near-optimally in order to collect as many hidden objects as possible. We provide an algorithm, prove theoretical guarantees, and empirically show that it outperforms the naïve baseline.


page 1

page 2

page 3

page 4


Individual Regret in Cooperative Nonstochastic Multi-Armed Bandits

We study agents communicating over an underlying network by exchanging m...

Graph Searching with Predictions

Consider an agent exploring an unknown graph in search of some goal stat...

Be Greedy in Multi-Armed Bandits

The Greedy algorithm is the simplest heuristic in sequential decision pr...

Censored Semi-Bandits for Resource Allocation

We consider the problem of sequentially allocating resources in a censor...

Distributed Bandits: Probabilistic Communication on d-regular Graphs

We study the decentralized multi-agent multi-armed bandit problem for ag...

Exact and approximation algorithms for the expanding search problem

Suppose a target is hidden in one of the vertices of an edge-weighted gr...

On Reachable Assignments in Cycles and Cliques

The efficient and fair distribution of indivisible resources among agent...

Please sign up or login with your details

Forgot password? Click here to reset