Generating Wikipedia Article Sections from Diverse Data Sources

12/29/2020
by   Mingda Chen, et al.
0

Datasets for data-to-text generation typically focus either on multi-domain, single-sentence generation or on single-domain, long-form generation. In this work, we create a large-scale dataset, WikiTableT, that pairs Wikipedia sections with their corresponding tabular data and various metadata. WikiTableT contains millions of instances, covering a broad range of topics, as well as a variety of flavors of generation tasks with different levels of flexibility. We benchmark several training and decoding strategies on WikiTableT. Our qualitative analysis shows that the best approaches can generate fluent and high quality texts but they sometimes struggle with coherence.

READ FULL TEXT
research
05/13/2019

Towards Content Transfer through Grounded Text Generation

Recent work in neural generation has attracted significant interest in c...
research
07/20/2021

WikiGraphs: A Wikipedia Text - Knowledge Graph Paired Dataset

We present a new dataset of Wikipedia articles each paired with a knowle...
research
11/20/2022

How to Describe Images in a More Funny Way? Towards a Modular Approach to Cross-Modal Sarcasm Generation

Sarcasm generation has been investigated in previous studies by consider...
research
05/15/2018

Harvesting Paragraph-Level Question-Answer Pairs from Wikipedia

We study the task of generating from Wikipedia articles question-answer ...
research
12/19/2022

OASum: Large-Scale Open Domain Aspect-based Summarization

Aspect or query-based summarization has recently caught more attention, ...
research
05/22/2020

A Generative Approach to Titling and Clustering Wikipedia Sections

We evaluate the performance of transformer encoders with various decoder...
research
12/16/2021

FRUIT: Faithfully Reflecting Updated Information in Text

Textual knowledge bases such as Wikipedia require considerable effort to...

Please sign up or login with your details

Forgot password? Click here to reset