Off-Policy Self-Critical Training for Transformer in Visual Paragraph Generation

06/21/2020
by   Shiyang Yan, et al.
0

Recently, several approaches have been proposed to solve language generation problems. Transformer is currently state-of-the-art seq-to-seq model in language generation. Reinforcement Learning (RL) is useful in solving exposure bias and the optimisation on non-differentiable metrics in seq-to-seq language learning. However, Transformer is hard to combine with RL as the costly computing resource is required for sampling. We tackle this problem by proposing an off-policy RL learning algorithm where a behaviour policy represented by GRUs performs the sampling. We reduce the high variance of importance sampling (IS) by applying the truncated relative importance sampling (TRIS) technique and Kullback-Leibler (KL)-control concept. TRIS is a simple yet effective technique, and there is a theoretical proof that KL-control helps to reduce the variance of IS. We formulate this off-policy RL based on self-critical sequence training. Specifically, we use a Transformer-based captioning model as the target policy and use an image-guided language auto-encoder as the behaviour policy to explore the environment. The proposed algorithm achieves state-of-the-art performance on the visual paragraph generation and improved results on image captioning.

READ FULL TEXT
research
10/30/2018

Relative Importance Sampling For Off-Policy Actor-Critic in Deep Reinforcement Learning

Off-policy learning is more unstable compared to on-policy learning in r...
research
11/11/2015

Doubly Robust Off-policy Value Evaluation for Reinforcement Learning

We study the problem of off-policy value evaluation in reinforcement lea...
research
08/16/2018

Context-Aware Visual Policy Network for Sequence-Level Image Captioning

Many vision-language tasks can be reduced to the problem of sequence pre...
research
11/13/2018

Image Captioning Based on a Hierarchical Attention Mechanism and Policy Gradient Optimization

Automatically generating the descriptions of an image, i.e., image capti...
research
03/21/2022

Lean Evolutionary Reinforcement Learning by Multitasking with Importance Sampling

Studies have shown evolution strategies (ES) to be a promising approach ...
research
09/20/2021

Learning Natural Language Generation from Scratch

This paper introduces TRUncated ReinForcement Learning for Language (Tru...
research
12/13/2021

Lifelong Hyper-Policy Optimization with Multiple Importance Sampling Regularization

Learning in a lifelong setting, where the dynamics continually evolve, i...

Please sign up or login with your details

Forgot password? Click here to reset