Nested-Wasserstein Self-Imitation Learning for Sequence Generation

01/20/2020
by   Ruiyi Zhang, et al.
0

Reinforcement learning (RL) has been widely studied for improving sequence-generation models. However, the conventional rewards used for RL training typically cannot capture sufficient semantic information and therefore render model bias. Further, the sparse and delayed rewards make RL exploration inefficient. To alleviate these issues, we propose the concept of nested-Wasserstein distance for distributional semantic matching. To further exploit it, a novel nested-Wasserstein self-imitation learning framework is developed, encouraging the model to exploit historical high-rewarded sequences for enhanced exploration and better semantic matching. Our solution can be understood as approximately executing proximal policy optimization with Wasserstein trust-regions. Experiments on a variety of unconditional and conditional sequence-generation tasks demonstrate the proposed approach consistently leads to improved performance.

READ FULL TEXT

page 6

page 18

research
10/14/2020

Self-Imitation Learning in Sparse Reward Settings

The application of reinforcement learning (RL) in real-world is still li...
research
06/11/2019

Wasserstein Reinforcement Learning

We propose behavior-driven optimization via Wasserstein distances (WDs) ...
research
11/02/2018

Sequence Generation with Guider Network

Sequence generation with reinforcement learning (RL) has received signif...
research
08/28/2019

An Empirical Comparison on Imitation Learning and Reinforcement Learning for Paraphrase Generation

Generating paraphrases from given sentences involves decoding words step...
research
12/10/2019

Imitation Learning via Off-Policy Distribution Matching

When performing imitation learning from expert demonstrations, distribut...
research
01/17/2019

Amplifying the Imitation Effect for Reinforcement Learning of UCAV's Mission Execution

This paper proposes a new reinforcement learning (RL) algorithm that enh...
research
02/10/2021

Learning Equational Theorem Proving

We develop Stratified Shortest Solution Imitation Learning (3SIL) to lea...

Please sign up or login with your details

Forgot password? Click here to reset