Adversarial Online Learning with Variable Plays in the Pursuit-Evasion Game: Theoretical Foundations and Application in Connected and Automated Vehicle Cybersecurity

10/26/2021
by   Yiyang Wang, et al.
0

We extend the adversarial/non-stochastic multi-play multi-armed bandit (MPMAB) to the case where the number of arms to play is variable. The work is motivated by the fact that the resources allocated to scan different critical locations in an interconnected transportation system change dynamically over time and depending on the environment. By modeling the malicious hacker and the intrusion monitoring system as the attacker and the defender, respectively, we formulate the problem for the two players as a sequential pursuit-evasion game. We derive the condition under which a Nash equilibrium of the strategic game exists. For the defender side, we provide an exponential-weighted based algorithm with sublinear pseudo-regret. We further extend our model to heterogeneous rewards for both players, and obtain lower and upper bounds on the average reward for the attacker. We provide numerical experiments to demonstrate the effectiveness of a variable-arm play.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/27/2017

Multi-armed Bandit Problems with Strategic Arms

We study a strategic version of the multi-armed bandit problem, where ea...
research
05/30/2023

Competing for Shareable Arms in Multi-Player Multi-Armed Bandits

Competitions for shareable and limited resources have long been studied ...
research
10/29/2020

Multitask Bandit Learning through Heterogeneous Feedback Aggregation

In many real-world applications, multiple agents seek to learn how to pe...
research
02/04/2020

Selfish Robustness and Equilibria in Multi-Player Bandits

Motivated by cognitive radios, stochastic multi-player multi-armed bandi...
research
12/09/2015

Multi-Player Bandits -- a Musical Chairs Approach

We consider a variant of the stochastic multi-armed bandit problem, wher...
research
06/04/2018

Implementing Mediators with Asynchronous Cheap Talk

A mediator can help non-cooperative agents obtain an equilibrium that ma...
research
06/19/2020

Gradient-free Online Learning in Games with Delayed Rewards

Motivated by applications to online advertising and recommender systems,...

Please sign up or login with your details

Forgot password? Click here to reset