Environment Reconstruction with Hidden Confounders for Reinforcement Learning based Recommendation

by   Wenjie Shang, et al.

Reinforcement learning aims at searching the best policy model for decision making, and has been shown powerful for sequential recommendations. The training of the policy by reinforcement learning, however, is placed in an environment. In many real-world applications, however, the policy training in the real environment can cause an unbearable cost, due to the exploration in the environment. Environment reconstruction from the past data is thus an appealing way to release the power of reinforcement learning in these applications. The reconstruction of the environment is, basically, to extract the casual effect model from the data. However, real-world applications are often too complex to offer fully observable environment information. Therefore, quite possibly there are unobserved confounding variables lying behind the data. The hidden confounder can obstruct an effective reconstruction of the environment. In this paper, by treating the hidden confounder as a hidden policy, we propose a deconfounded multi-agent environment reconstruction (DEMER) approach in order to learn the environment together with the hidden confounder. DEMER adopts a multi-agent generative adversarial imitation learning framework. It proposes to introduce the confounder embedded policy, and use the compatible discriminator for training the policies. We then apply DEMER in an application of driver program recommendation. We firstly use an artificial driver program recommendation environment, abstracted from the real application, to verify and analyze the effectiveness of DEMER. We then test DEMER in the real application of Didi Chuxing. Experiment results show that DEMER can effectively reconstruct the hidden confounder, and thus can build the environment better. DEMER also derives a recommendation policy with a significantly improved performance in the test phase of the real application.


page 10

page 11


Generative Adversarial User Model for Reinforcement Learning Based Recommendation System

There are great interests as well as many challenges in applying reinfor...

Virtual-Taobao: Virtualizing Real-world Online Retail Environment for Reinforcement Learning

Applying reinforcement learning in physical-world tasks is extremely cha...

Learning to Deceive in Multi-Agent Hidden Role Games

Deception is prevalent in human social settings. However, studies into t...

Neural Model-Based Reinforcement Learning for Recommendation

There are great interests as well as many challenges in applying reinfor...

Accelerating Training in Pommerman with Imitation and Reinforcement Learning

The Pommerman simulation was recently developed to mimic the classic Jap...

Sequential Bayesian experimental designs via reinforcement learning

Bayesian experimental design (BED) has been used as a method for conduct...

Exploring Computational User Models for Agent Policy Summarization

AI agents are being developed to support high stakes decision-making pro...

Please sign up or login with your details

Forgot password? Click here to reset