Online Low Rank Matrix Completion
We study the problem of online low-rank matrix completion with 𝖬 users, 𝖭 items and 𝖳 rounds. In each round, we recommend one item per user. For each recommendation, we obtain a (noisy) reward sampled from a low-rank user-item reward matrix. The goal is to design an online method with sub-linear regret (in 𝖳). While the problem can be mapped to the standard multi-armed bandit problem where each item is an independent arm, it leads to poor regret as the correlation between arms and users is not exploited. In contrast, exploiting the low-rank structure of reward matrix is challenging due to non-convexity of low-rank manifold. We overcome this challenge using an explore-then-commit (ETC) approach that ensures a regret of O(𝗉𝗈𝗅𝗒𝗅𝗈𝗀 (𝖬+𝖭) 𝖳^2/3). That is, roughly only 𝗉𝗈𝗅𝗒𝗅𝗈𝗀 (𝖬+𝖭) item recommendations are required per user to get non-trivial solution. We further improve our result for the rank-1 setting. Here, we propose a novel algorithm OCTAL (Online Collaborative filTering using iterAtive user cLustering) that ensures nearly optimal regret bound of O(𝗉𝗈𝗅𝗒𝗅𝗈𝗀 (𝖬+𝖭) 𝖳^1/2). Our algorithm uses a novel technique of clustering users and eliminating items jointly and iteratively, which allows us to obtain nearly minimax optimal rate in 𝖳.
READ FULL TEXT