Motivated by the human-machine interaction such as training chatbots for...
We study the offline reinforcement learning (RL) in the face of unmeasur...
Offline reinforcement learning (RL) aims to learn the optimal policy fro...
In generative adversarial imitation learning (GAIL), the agent aims to l...
In offline reinforcement learning (RL) an optimal policy is learnt solel...
We study the global convergence and global optimality of actor-critic, o...
We study discrete-time mean-field Markov games with infinite numbers of
...
It is important to collect credible training samples (x,y) for building
...