MultiSum: A Dataset for Multimodal Summarization and Thumbnail Generation of Videos

06/07/2023
by   Jielin Qiu, et al.
0

Multimodal summarization with multimodal output (MSMO) has emerged as a promising research direction. Nonetheless, numerous limitations exist within existing public MSMO datasets, including insufficient upkeep, data inaccessibility, limited size, and the absence of proper categorization, which pose significant challenges to effective research. To address these challenges and provide a comprehensive dataset for this new direction, we have meticulously curated the MultiSum dataset. Our new dataset features (1) Human-validated summaries for both video and textual content, providing superior human instruction and labels for multimodal learning. (2) Comprehensively and meticulously arranged categorization, spanning 17 principal categories and 170 subcategories to encapsulate a diverse array of real-world scenarios. (3) Benchmark tests performed on the proposed dataset to assess varied tasks and methods, including video temporal segmentation, video summarization, text summarization, and multimodal summarization. To champion accessibility and collaboration, we release the MultiSum dataset and the data collection tool as fully open-source resources, fostering transparency and accelerating future developments. Our project website can be found at https://multisum-dataset.github.io/.

READ FULL TEXT

page 4

page 9

page 16

page 24

page 25

research
03/21/2023

VideoXum: Cross-modal Visual and Textural Summarization of Videos

Video summarization aims to distill the most important information from ...
research
04/07/2022

MHMS: Multimodal Hierarchical Multimedia Summarization

Multimedia summarization with multimodal output can play an essential ro...
research
10/10/2022

Hierarchical3D Adapters for Long Video-to-text Summarization

In this paper, we focus on video-to-text summarization and investigate h...
research
03/13/2023

Align and Attend: Multimodal Summarization with Dual Contrastive Losses

The goal of multimodal summarization is to extract the most important in...
research
10/06/2022

Towards Better Semantic Understanding of Mobile Interfaces

Improving the accessibility and automation capabilities of mobile device...
research
02/07/2020

Exploiting Temporal Coherence for Multi-modal Video Categorization

Multimodal ML models can process data in multiple modalities (e.g., vide...
research
10/10/2022

Semantics-Consistent Cross-domain Summarization via Optimal Transport Alignment

Multimedia summarization with multimodal output (MSMO) is a recently exp...

Please sign up or login with your details

Forgot password? Click here to reset