Learning to Search Better Than Your Teacher

02/08/2015
by   Kai-Wei Chang, et al.
0

Methods for learning to search for structured prediction typically imitate a reference policy, with existing theoretical guarantees demonstrating low regret compared to that reference. This is unsatisfactory in many applications where the reference policy is suboptimal and the goal of learning is to improve upon it. Can learning to search work even when the reference is poor? We provide a new learning to search algorithm, LOLS, which does well relative to the reference policy, but additionally guarantees low regret compared to deviations from the learned policy: a local-optimality guarantee. Consequently, LOLS can improve upon the reference policy, unlike previous algorithms. This enables us to develop structured contextual bandits, a partial information structured prediction setting with many potential applications.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/21/2021

Policy-Guided Heuristic Search with Guarantees

The use of a policy and a heuristic function for guiding search can be q...
research
09/02/2022

Regret Analysis of Dyadic Search

We analyze the cumulative regret of the Dyadic Search algorithm of Bacho...
research
06/09/2022

Conformal Off-Policy Prediction in Contextual Bandits

Most off-policy evaluation methods for contextual bandits have focused o...
research
07/18/2022

Online Learning with Off-Policy Feedback

We study the problem of online learning in adversarial bandit problems u...
research
02/26/2020

Structured Linear Contextual Bandits: A Sharp and Geometric Smoothed Analysis

Bandit learning algorithms typically involve the balance of exploration ...
research
10/24/2022

PAC-Bayesian Offline Contextual Bandits With Guarantees

This paper introduces a new principled approach for offline policy optim...
research
05/26/2023

Levin Tree Search with Context Models

Levin Tree Search (LTS) is a search algorithm that makes use of a policy...

Please sign up or login with your details

Forgot password? Click here to reset