Longtonotes: OntoNotes with Longer Coreference Chains

by   Kumar Shridhar, et al.
ETH Zurich

Ontonotes has served as the most important benchmark for coreference resolution. However, for ease of annotation, several long documents in Ontonotes were split into smaller parts. In this work, we build a corpus of coreference-annotated documents of significantly longer length than what is currently available. We do so by providing an accurate, manually-curated, merging of annotations from documents that were split into multiple parts in the original Ontonotes annotation process. The resulting corpus, which we call LongtoNotes contains documents in multiple genres of the English language with varying lengths, the longest of which are up to 8x the length of documents in Ontonotes, and 2x those in Litbank. We evaluate state-of-the-art neural coreference systems on this new corpus, analyze the relationships between model architectures/hyperparameters and document length on performance and efficiency of the models, and demonstrate areas of improvement in long-document coreference modeling revealed by our new corpus. Our data and code is available at: https://github.com/kumar-shridhar/LongtoNotes.


page 12

page 13


Marmara Turkish Coreference Corpus and Coreference Resolution Baseline

We describe the Marmara Turkish Coreference Corpus, which is an annotati...

CED: Catalog Extraction from Documents

Sentence-by-sentence information extraction from long documents is an ex...

An Annotated Dataset of Coreference in English Literature

We present in this work a new dataset of coreference annotations for wor...

IncDSI: Incrementally Updatable Document Retrieval

Differentiable Search Index is a recently proposed paradigm for document...

The Text Anonymization Benchmark (TAB): A Dedicated Corpus and Evaluation Framework for Text Anonymization

We present a novel benchmark and associated evaluation metrics for asses...

Are Abstracts Enough for Hypothesis Generation?

The potential for automatic hypothesis generation (HG) systems to improv...

Please sign up or login with your details

Forgot password? Click here to reset