MaxHedge: Maximising a Maximum Online with Theoretical Performance Guarantees

10/28/2018
by   Stephen Pasteris, et al.
0

We introduce a new online learning framework where, at each trial, the learner is required to select a subset of actions from a given known action set. Each action is associated with an energy value, a reward and a cost. The sum of the energies of the actions selected cannot exceed a given energy budget. The goal is to maximise the cumulative profit, where the profit obtained on a single trial is defined as the difference between the maximum reward among the selected actions and the sum of their costs. Action energy values and the budget are known and fixed. All rewards and costs associated with each action change over time and are revealed at each trial only after the learner's selection of actions. Our framework encompasses several online learning problems where the environment changes over time; and the solution trades-off between minimising the costs and maximising the maximum reward of the selected subset of actions, while being constrained to an action energy budget. The algorithm that we propose is an efficient and very scalable unifying approach which is capable of solving our general problem. Hence, our method solves several online learning problems which fall into this general framework.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/23/2018

Unifying the stochastic and the adversarial Bandits with Knapsack

This paper investigates the adversarial Bandits with Knapsack (BwK) onli...
research
07/06/2020

Online Learning of Facility Locations

In this paper, we provide a rigorous theoretical investigation of an onl...
research
08/04/2015

Staged Multi-armed Bandits

In this paper, we introduce a new class of reinforcement learning method...
research
05/29/2016

A budget-constrained inverse classification framework for smooth classifiers

Inverse classification is the process of manipulating an instance such t...
research
09/28/2022

Online Subset Selection using α-Core with no Augmented Regret

We consider the problem of sequential sparse subset selections in an onl...
research
11/01/2022

Reinforcement Learning in Education: A Multi-Armed Bandit Approach

Advances in reinforcement learning research have demonstrated the ways i...
research
11/07/2018

Computing the Value of Computation for Planning

An intelligent agent performs actions in order to achieve its goals. Suc...

Please sign up or login with your details

Forgot password? Click here to reset