Ensembling and Knowledge Distilling of Large Sequence Taggers for Grammatical Error Correction

03/24/2022
by   Maksym Tarnavskyi, et al.
0

In this paper, we investigate improvements to the GEC sequence tagging architecture with a focus on ensembling of recent cutting-edge Transformer-based encoders in Large configurations. We encourage ensembling models by majority votes on span-level edits because this approach is tolerant to the model architecture and vocabulary size. Our best ensemble achieves a new SOTA result with an F_0.5 score of 76.05 on BEA-2019 (test), even without pre-training on synthetic datasets. In addition, we perform knowledge distillation with a trained ensemble to generate new synthetic training datasets, "Troy-Blogs" and "Troy-1BW". Our best single sequence tagging model that is pretrained on the generated Troy-datasets in combination with the publicly available synthetic PIE dataset achieves a near-SOTA (To the best of our knowledge, our best single model gives way only to much heavier T5 model result with an F_0.5 score of 73.21 on BEA-2019 (test). The code, datasets, and trained models are publicly available).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/26/2020

GECToR – Grammatical Error Correction: Tag, Not Rewrite

In this paper, we present a simple and efficient GEC sequence tagger usi...
research
05/20/2023

Accurate Knowledge Distillation with n-best Reranking

We propose extending the Sequence-level Knowledge Distillation (Kim and ...
research
02/12/2023

An Extended Sequence Tagging Vocabulary for Grammatical Error Correction

We extend a current sequence-tagging approach to Grammatical Error Corre...
research
05/28/2021

Hierarchical Transformer Encoders for Vietnamese Spelling Correction

In this paper, we propose a Hierarchical Transformer model for Vietnames...
research
03/08/2021

Text Simplification by Tagging

Edit-based approaches have recently shown promising results on multiple ...
research
05/27/2023

FoPro-KD: Fourier Prompted Effective Knowledge Distillation for Long-Tailed Medical Image Recognition

Transfer learning is a promising technique for medical image classificat...
research
09/05/2023

Language Models for Novelty Detection in System Call Traces

Due to the complexity of modern computer systems, novel and unexpected b...

Please sign up or login with your details

Forgot password? Click here to reset