Automatic Compilation of Resources for Academic Writing and Evaluating with Informal Word Identification and Paraphrasing System

by   Seid Muhie Yimam, et al.

We present the first approach to automatically building resources for academic writing. The aim is to build a writing aid system that automatically edits a text so that it better adheres to the academic style of writing. On top of existing academic resources, such as the Corpus of Contemporary American English (COCA) academic Word List, the New Academic Word List, and the Academic Collocation List, we also explore how to dynamically build such resources that would be used to automatically identify informal or non-academic words or phrases. The resources are compiled using different generic approaches that can be extended for different domains and languages. We describe the evaluation of resources with a system implementation. The system consists of an informal word identification (IWI), academic candidate paraphrase generation, and paraphrase ranking components. To generate candidates and rank them in context, we have used the PPDB and WordNet paraphrase resources. We use the Concepts in Context (CoInCO) "All-Words" lexical substitution dataset both for the informal word identification and paraphrase generation experiments. Our informal word identification component achieves an F-1 score of 82 outperforming a stratified classifier baseline. The main contribution of this work is a domain-independent methodology to build targeted resources for writing aids.


page 1

page 2

page 3

page 4


Automatic Extraction of the Romanian Academic Word List: Data and Methods

This paper presents the methodology and data used for the automatic extr...

Synthetic Error Dataset Generation Mimicking Bengali Writing Pattern

While writing Bengali using English keyboard, users often make spelling ...

Span Identification of Epistemic Stance-Taking in Academic Written English

Responding to the increasing need for automated writing evaluation (AWE)...

LScDC-new large scientific dictionary

In this paper, we present a scientific corpus of abstracts of academic p...

ChatGPT or academic scientist? Distinguishing authorship with over 99 accuracy using off-the-shelf machine learning tools

ChatGPT has enabled access to AI-generated writing for the masses, and w...

TxPI-u: A Resource for Personality Identification of Undergraduates

Resources such as labeled corpora are necessary to train automatic model...

What do writing features tell us about AI papers?

As the numbers of submissions to conferences grow quickly, the task of a...

Please sign up or login with your details

Forgot password? Click here to reset