SciXGen: A Scientific Paper Dataset for Context-Aware Text Generation

10/20/2021
by   Hong Chen, et al.
0

Generating texts in scientific papers requires not only capturing the content contained within the given input but also frequently acquiring the external information called context. We push forward the scientific text generation by proposing a new task, namely context-aware text generation in the scientific domain, aiming at exploiting the contributions of context in generated texts. To this end, we present a novel challenging large-scale Scientific Paper Dataset for ConteXt-Aware Text Generation (SciXGen), consisting of well-annotated 205,304 papers with full references to widely-used objects (e.g., tables, figures, algorithms) in a paper. We comprehensively benchmark, using state-of-the-arts, the efficacy of our newly constructed SciXGen dataset in generating description and paragraph. Our dataset and benchmarks will be made publicly available to hopefully facilitate the scientific text generation research.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/16/2021

Learning to Reason for Text Generation from Scientific Tables

In this paper, we introduce SciGen, a new challenge dataset for the task...
research
12/09/2022

Fill in the Blank: Context-aware Automated Text Input Generation for Mobile GUI Testing

Automated GUI testing is widely used to help ensure the quality of mobil...
research
09/09/2021

Graphine: A Dataset for Graph-aware Terminology Definition Generation

Precisely defining the terminology is the first step in scientific commu...
research
08/16/2021

AutoChart: A Dataset for Chart-to-Text Generation Task

The analytical description of charts is an exciting and important resear...
research
09/07/2022

SynSciPass: detecting appropriate uses of scientific text generation

Approaches to machine generated text detection tend to focus on binary c...
research
01/17/2021

Narration Generation for Cartoon Videos

Research on text generation from multimodal inputs has largely focused o...
research
10/12/2020

MedICaT: A Dataset of Medical Images, Captions, and Textual References

Understanding the relationship between figures and text is key to scient...

Please sign up or login with your details

Forgot password? Click here to reset