WikiReading: A Novel Large-scale Language Understanding Task over Wikipedia

by   Daniel Hewlett, et al.

We present WikiReading, a large-scale natural language understanding task and publicly-available dataset with 18 million instances. The task is to predict textual values from the structured knowledge base Wikidata by reading the text of the corresponding Wikipedia articles. The task contains a rich variety of challenging classification and extraction sub-tasks, making it well-suited for end-to-end models such as deep neural networks (DNNs). We compare various state-of-the-art DNN-based architectures for document classification, information extraction, and question answering. We find that models supporting a rich answer space, such as word or character sequences, perform best. Our best-performing model, a word-level sequence to sequence model with a mechanism to copy out-of-vocabulary words, obtains an accuracy of 71.8


page 1

page 2

page 3

page 4


Jack the Reader - A Machine Reading Framework

Many Machine Reading and Natural Language Understanding tasks require re...

LSOIE: A Large-Scale Dataset for Supervised Open Information Extraction

Open Information Extraction (OIE) systems seek to compress the factual p...

Ape210K: A Large-Scale and Template-Rich Dataset of Math Word Problems

Automatic math word problem solving has attracted growing attention in r...

KoBigBird-large: Transformation of Transformer for Korean Language Understanding

This work presents KoBigBird-large, a large size of Korean BigBird that ...

Attend, Copy, Parse - End-to-end information extraction from documents

Document information extraction tasks performed by humans create data co...

WikiCREM: A Large Unsupervised Corpus for Coreference Resolution

Pronoun resolution is a major area of natural language understanding. Ho...

Scalable Semantic Querying of Text

We present the KOKO system that takes declarative information extraction...

Please sign up or login with your details

Forgot password? Click here to reset