Piecewise-Stationary Off-Policy Optimization

by   Joey Hong, et al.

Off-policy learning is a framework for evaluating and optimizing policies without deploying them, from data collected by another policy. Real-world environments are typically non-stationary and the offline learned policies should adapt to these changes. To address this challenge, we study the novel problem of off-policy optimization in piecewise-stationary contextual bandits. Our proposed solution has two phases. In the offline learning phase, we partition logged data into categorical latent states and learn a near-optimal sub-policy for each state. In the online deployment phase, we adaptively switch between the learned sub-policies based on their performance. This approach is practical and analyzable, and we provide guarantees on both the quality of off-policy optimization and the regret during online deployment. To show the effectiveness of our approach, we compare it to state-of-the-art baselines on both synthetic and real-world datasets. Our approach outperforms methods that act only on observed context.


page 1

page 2

page 3

page 4


Combining Online Learning and Offline Learning for Contextual Bandits with Deficient Support

We address policy learning with logged data in contextual bandits. Curre...

Pessimistic Off-Policy Optimization for Learning to Rank

Off-policy learning is a framework for optimizing policies without deplo...

Off-Policy Optimization of Portfolio Allocation Policies under Constraints

The dynamic portfolio optimization problem in finance frequently require...

Policy Learning with Adaptively Collected Data

Learning optimal policies from historical data enables the gains from pe...

Online Learning for Non-Stationary A/B Tests

The rollout of new versions of a feature in modern applications is a man...

Offline Contextual Bandits for Wireless Network Optimization

The explosion in mobile data traffic together with the ever-increasing e...

PAC-Bayesian Offline Contextual Bandits With Guarantees

This paper introduces a new principled approach for offline policy optim...

Please sign up or login with your details

Forgot password? Click here to reset