No Metrics Are Perfect: Adversarial Reward Learning for Visual Storytelling

04/24/2018
by   Xin Wang, et al.
0

Though impressive results have been achieved in visual captioning, the task of generating abstract stories from photo streams is still a little-tapped problem. Different from captions, stories have more expressive language styles and contain many imaginary concepts that do not appear in the images. Thus it poses challenges to behavioral cloning algorithms. Furthermore, due to the limitations of automatic metrics on evaluating story quality, reinforcement learning methods with hand-crafted rewards also face difficulties in gaining an overall performance boost. Therefore, we propose an Adversarial REward Learning (AREL) framework to learn an implicit reward function from human demonstrations, and then optimize policy search with the learned reward function. Though automatic evaluation indicates slight performance boost over state-of-the-art (SOTA) methods in cloning expert behaviors, human evaluation shows that our approach achieves significant improvement in generating more human-like stories than SOTA systems.

READ FULL TEXT

page 9

page 13

page 14

page 15

research
06/02/2023

PAGAR: Imitation Learning with Protagonist Antagonist Guided Adversarial Reward

Imitation learning (IL) algorithms often rely on inverse reinforcement l...
research
10/07/2017

Meta Inverse Reinforcement Learning via Maximum Reward Sharing for Human Motion Analysis

This work handles the inverse reinforcement learning (IRL) problem where...
research
06/01/2022

Transferable Reward Learning by Dynamics-Agnostic Discriminator Ensemble

Inverse reinforcement learning (IRL) recovers the underlying reward func...
research
04/22/2020

Policy Gradient from Demonstration and Curiosity

With reinforcement learning, an agent could learn complex behaviors from...
research
11/08/2021

Batch Reinforcement Learning from Crowds

A shortcoming of batch reinforcement learning is its requirement for rew...
research
05/09/2020

Semi-Supervised Dialogue Policy Learning via Stochastic Reward Estimation

Dialogue policy optimization often obtains feedback until task completio...
research
06/03/2021

LiMIIRL: Lightweight Multiple-Intent Inverse Reinforcement Learning

Multiple-Intent Inverse Reinforcement Learning (MI-IRL) seeks to find a ...

Please sign up or login with your details

Forgot password? Click here to reset