In reinforcement learning (RL), a reward function is often assumed at th...
Monitoring a population of dependent processes under limited resources i...
Transformer-based pretrained language models (PLMs) have achieved great
...
A crucial task in decision-making problems is reward engineering. It is
...
In this paper, we study representation learning in partially observable
...
Off-policy Learning to Rank (LTR) aims to optimize a ranker from data
co...
We propose the first study of adversarial attacks on online learning to ...
Data plays a crucial role in machine learning. However, in real-world
ap...
Bandit algorithms have become a reference solution for interactive
recom...
Online influence maximization aims to maximize the influence spread of a...
We tackle the communication efficiency challenge of learning kernelized
...
Directed Evolution (DE), a landmark wet-lab method originated in 1960s,
...
We study adversarial attacks on linear stochastic bandits, a sequential
...
We study the problem of incentivizing exploration for myopic users in li...
Online Learning to Rank (OL2R) eliminates the need of explicit relevance...
We investigate the sparse linear contextual bandit problem where the
par...
How to obtain an unbiased ranking model by learning to rank with biased ...
We study incentivized exploration for the multi-armed bandit (MAB) probl...
In this paper, we focus on unsupervised domain adaptation for Machine Re...
Online Learning to Rank (OL2R) algorithms learn from implicit user feedb...
We study the problem of online influence maximization in social networks...
Online learning to rank (OL2R) optimizes the utility of returned search
...