Summary-Oriented Vision Modeling for Multimodal Abstractive Summarization

12/15/2022
by   Yunlong Liang, et al.
0

The goal of multimodal abstractive summarization (MAS) is to produce a concise summary given the multimodal data (text and vision). Existing studies on MAS mainly focus on how to effectively use the extracted visual features, having achieved impressive success on the high-resource English dataset. However, less attention has been paid to the quality of the visual features to the summary, which may limit the model performance especially in the low- and zero-resource scenarios. In this paper, we propose to improve the summary quality through summary-oriented visual features. To this end, we devise two auxiliary tasks including vision to summary task and masked image modeling task. Together with the main summarization task, we optimize the MAS model via the training objectives of all these tasks. By these means, the MAS model can be enhanced by capturing the summary-oriented visual features, thereby yielding more accurate summaries. Experiments on 44 languages, covering mid-high-, low-, and zero-resource scenarios, verify the effectiveness and superiority of the proposed approach, which achieves state-of-the-art performance under all scenarios.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/22/2023

D^2TV: Dual Knowledge Distillation and Target-oriented Vision Modeling for Many-to-Many Multimodal Summarization

Many-to-many multimodal summarization (M^3S) task aims to generate summa...
research
05/05/2015

Visual Summary of Egocentric Photostreams by Representative Keyframes

Building a visual summary from an egocentric photostream captured by a l...
research
11/04/2022

Evaluating and Improving Factuality in Multimodal Abstractive Summarization

Current metrics for evaluating factuality for abstractive document summa...
research
12/25/2022

GAE-ISumm: Unsupervised Graph-Based Summarization of Indian Languages

Document summarization aims to create a precise and coherent summary of ...
research
08/24/2022

Modeling Paragraph-Level Vision-Language Semantic Alignment for Multi-Modal Summarization

Most current multi-modal summarization methods follow a cascaded manner,...
research
12/19/2022

Unsupervised Summarization Re-ranking

With the rise of task-specific pre-training objectives, abstractive summ...
research
06/04/2023

A Comparative Evaluation of Visual Summarization Techniques for Event Sequences

Real-world event sequences are often complex and heterogeneous, making i...

Please sign up or login with your details

Forgot password? Click here to reset