Online Joint Assortment-Inventory Optimization under MNL Choices

by   Yong Liang, et al.

We study an online joint assortment-inventory optimization problem, in which we assume that the choice behavior of each customer follows the Multinomial Logit (MNL) choice model, and the attraction parameters are unknown a priori. The retailer makes periodic assortment and inventory decisions to dynamically learn from the realized demands about the attraction parameters while maximizing the expected total profit over time. In this paper, we propose a novel algorithm that can effectively balance the exploration and exploitation in the online decision-making of assortment and inventory. Our algorithm builds on a new estimator for the MNL attraction parameters, a novel approach to incentivize exploration by adaptively tuning certain known and unknown parameters, and an optimization oracle to static single-cycle assortment-inventory planning problems with given parameters. We establish a regret upper bound for our algorithm and a lower bound for the online joint assortment-inventory optimization problem, suggesting that our algorithm achieves nearly optimal regret rate, provided that the static optimization oracle is exact. Then we incorporate more practical approximate static optimization oracles into our algorithm, and bound from above the impact of static optimization errors on the regret of our algorithm. At last, we perform numerical studies to demonstrate the effectiveness of our proposed algorithm.


page 1

page 2

page 3

page 4


Lipschitz Bandit Optimization with Improved Efficiency

We consider the Lipschitz bandit optimization problem with an emphasis o...

Proximal Online Gradient is Optimum for Dynamic Regret

In online learning, the dynamic regret metric chooses the reference (opt...

Dynamic Assortment Optimization with Changing Contextual Information

In this paper, we study the dynamic assortment optimization problem unde...

Dynamic Assortment Selection under the Nested Logit Models

We study a stylized dynamic assortment planning problem during a selling...

Learning Stochastic Shortest Path with Linear Function Approximation

We study the stochastic shortest path (SSP) problem in reinforcement lea...

Dynamic Learning of Sequential Choice Bandit Problem under Marketing Fatigue

Motivated by the observation that overexposure to unwanted marketing act...

Learning to Crawl

Web crawling is the problem of keeping a cache of webpages fresh, i.e., ...

Please sign up or login with your details

Forgot password? Click here to reset