On the Finite-Time Performance of the Knowledge Gradient Algorithm

06/14/2022
by   Yanwen Li, et al.
0

The knowledge gradient (KG) algorithm is a popular and effective algorithm for the best arm identification (BAI) problem. Due to the complex calculation of KG, theoretical analysis of this algorithm is difficult, and existing results are mostly about the asymptotic performance of it, e.g., consistency, asymptotic sample allocation, etc. In this research, we present new theoretical results about the finite-time performance of the KG algorithm. Under independent and normally distributed rewards, we derive lower bounds and upper bounds for the probability of error and simple regret of the algorithm. With these bounds, existing asymptotic results become simple corollaries. We also show the performance of the algorithm for the multi-armed bandit (MAB) problem. These developments not only extend the existing analysis of the KG algorithm, but can also be used to analyze other improvement-based algorithms. Last, we use numerical experiments to further demonstrate the finite-time behavior of the KG algorithm.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/15/2023

Regret Lower Bounds in Multi-agent Multi-armed Bandit

Multi-armed Bandit motivates methods with provable upper bounds on regre...
research
02/26/2018

Best Arm Identification for Contaminated Bandits

We propose the Contaminated Best Arm Identification variant of the Multi...
research
05/18/2012

Thompson Sampling: An Asymptotically Optimal Finite Time Analysis

The question of the optimality of Thompson Sampling for solving the stoc...
research
05/12/2015

Asymptotic Behavior of Minimal-Exploration Allocation Policies: Almost Sure, Arbitrarily Slow Growing Regret

The purpose of this paper is to provide further understanding into the s...
research
09/16/2021

Policy Choice and Best Arm Identification: Comments on "Adaptive Treatment Assignment in Experiments for Policy Choice"

Adaptive experimental design for efficient decision-making is an importa...
research
06/28/2021

Learning from an Exploring Demonstrator: Optimal Reward Estimation for Bandits

We introduce the "inverse bandit" problem of estimating the rewards of a...
research
10/11/2022

Non-Asymptotic Analysis of a UCB-based Top Two Algorithm

A Top Two sampling rule for bandit identification is a method which sele...

Please sign up or login with your details

Forgot password? Click here to reset