Wronging a Right: Generating Better Errors to Improve Grammatical Error Detection

09/26/2018
by   Sudhanshu Kasewa, et al.
0

Grammatical error correction, like other machine learning tasks, greatly benefits from large quantities of high quality training data, which is typically expensive to produce. While writing a program to automatically generate realistic grammatical errors would be difficult, one could learn the distribution of naturallyoccurring errors and attempt to introduce them into other datasets. Initial work on inducing errors in this way using statistical machine translation has shown promise; we investigate cheaply constructing synthetic samples, given a small corpus of human-annotated data, using an off-the-rack attentive sequence-to-sequence model and a straight-forward post-processing procedure. Our approach yields error-filled artificial data that helps a vanilla bi-directional LSTM to outperform the previous state of the art at grammatical error detection, and a previously introduced model to gain further improvements of over 5 determine if a given sentence is synthetic, a human annotator at best achieves 39.39 F_1 score, indicating that our model generates mostly human-like instances.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/20/2021

Grammatical Error Generation Based on Translated Fragments

We perform neural machine translation of sentence fragments in order to ...
research
07/21/2019

The Unbearable Weight of Generating Artificial Errors for Grammatical Error Correction

In recent years, sequence-to-sequence models have been very effective fo...
research
07/17/2017

Artificial Error Generation with Machine Translation and Syntactic Patterns

Shortage of available training data is holding back progress in the area...
research
05/27/2021

Synthetic Data Generation for Grammatical Error Correction with Tagged Corruption Models

Synthetic data generation is widely known to boost the accuracy of neura...
research
06/17/2022

Automatic Correction of Human Translations

We introduce translation error correction (TEC), the task of automatical...
research
05/29/2021

Grammatical Error Correction as GAN-like Sequence Labeling

In Grammatical Error Correction (GEC), sequence labeling models enjoy fa...
research
11/04/2017

Merging error analysis of name disambiguation based on author similarity

Falsely identifying different authors as one is called merging error in ...

Please sign up or login with your details

Forgot password? Click here to reset