Huazheng Wang

research

∙ 08/03/2023

Aligning Agent Policy with Externalities: Reward Design via Bilevel RL

In reinforcement learning (RL), a reward function is often assumed at th...

0 Souradip Chakraborty, et al. ∙

research

∙ 07/26/2023

Online Modeling and Monitoring of Dependent Processes under Resource Constraints

Monitoring a population of dependent processes under limited resources i...

0 Tanapol Kosolwattana, et al. ∙

research

∙ 07/26/2023

How Does Diffusion Influence Pretrained Language Models on Out-of-Distribution Data?

Transformer-based pretrained language models (PLMs) have achieved great ...

0 Huazheng Wang, et al. ∙

research

∙ 07/24/2023

Provable Benefits of Policy Learning from Human Preferences in Contextual Bandit Problems

A crucial task in decision-making problems is reward engineering. It is ...

0 Xiang Ji, et al. ∙

research

∙ 06/21/2023

Provably Efficient Representation Learning with Tractable Planning in Low-Rank POMDP

In this paper, we study representation learning in partially observable ...

0 Jiacheng Guo, et al. ∙

research

∙ 06/13/2023

Unified Off-Policy Learning to Rank: a Reinforcement Learning Perspective

Off-policy Learning to Rank (LTR) aims to optimize a ranker from data co...

7 Zeyu Zhang, et al. ∙

research

∙ 05/30/2023

Adversarial Attacks on Online Learning to Rank with Stochastic Click Models

We propose the first study of adversarial attacks on online learning to ...

0 Zichen Wang, et al. ∙

research

∙ 02/08/2023

Machine Learning for Synthetic Data Generation: a Review

Data plays a crucial role in machine learning. However, in real-world ap...

0 Yingzhou Lu, et al. ∙

research

∙ 08/30/2022

Dynamic Global Sensitivity for Differentially Private Contextual Bandits

Bandit algorithms have become a reference solution for interactive recom...

0 Huazheng Wang, et al. ∙

research

∙ 06/29/2022

Provably Efficient Reinforcement Learning for Online Adaptive Influence Maximization

Online influence maximization aims to maximize the influence spread of a...

3 Kaixuan Huang, et al. ∙

research

∙ 06/10/2022

Communication Efficient Distributed Learning for Kernelized Contextual Bandits

We tackle the communication efficiency challenge of learning kernelized ...

0 Chuanhao Li, et al. ∙

research

∙ 06/05/2022

Bandit Theory and Thompson Sampling-Guided Directed Evolution for Sequence Optimization

Directed Evolution (DE), a landmark wet-lab method originated in 1960s, ...

0 Hui Yuan, et al. ∙

research

∙ 10/18/2021

When Are Linear Stochastic Bandits Attackable?

We study adversarial attacks on linear stochastic bandits, a sequential ...

0 Huazheng Wang, et al. ∙

research

∙ 04/08/2021

Incentivizing Exploration in Linear Bandits under Information Gap

We study the problem of incentivizing exploration for myopic users in li...

0 Huazheng Wang, et al. ∙

research

∙ 02/28/2021

PairRank: Online Pairwise Learning to Rank by Divide-and-Conquer

Online Learning to Rank (OL2R) eliminates the need of explicit relevance...

0 Yiling Jia, et al. ∙

research

∙ 07/16/2020

A Smoothed Analysis of Online Lasso for the Sparse Linear Contextual Bandit Problem

We investigate the sparse linear contextual bandit problem where the par...

0 Zhiyuan Liu, et al. ∙

research

∙ 04/28/2020

Unbiased Learning to Rank: Online or Offline?

How to obtain an unbiased ranking model by learning to rank with biased ...

0 Qingyao Ai, et al. ∙

research

∙ 11/12/2019

Incentivized Exploration for Multi-Armed Bandits under Reward Drift

We study incentivized exploration for the multi-armed bandit (MAB) probl...

0 Zhiyuan Liu, et al. ∙

research

∙ 08/24/2019

Adversarial Domain Adaptation for Machine Reading Comprehension

In this paper, we focus on unsupervised domain adaptation for Machine Re...

0 Huazheng Wang, et al. ∙

research

∙ 06/10/2019

Variance Reduction in Gradient Exploration for Online Learning to Rank

Online Learning to Rank (OL2R) algorithms learn from implicit user feedb...

0 Huazheng Wang, et al. ∙

research

∙ 06/09/2019

Factorization Bandits for Online Influence Maximization

We study the problem of online influence maximization in social networks...

0 Qingyun Wu, et al. ∙

research

∙ 05/18/2018

Efficient Exploration of Gradient Space for Online Learning to Rank

Online learning to rank (OL2R) optimizes the utility of returned search ...

0 Huazheng Wang, et al. ∙

Huazheng Wang

Featured Co-authors

Sign in with Google

Consider DeepAI Pro