Automated Scoring for Reading Comprehension via In-context BERT Tuning

by   Nigel Fernandez, et al.

Automated scoring of open-ended student responses has the potential to significantly reduce human grader effort. Recent advances in automated scoring often leverage textual representations based on pre-trained language models such as BERT and GPT as input to scoring models. Most existing approaches train a separate model for each item/question, which is suitable for scenarios such as essay scoring where items can be quite different from one another. However, these approaches have two limitations: 1) they fail to leverage item linkage for scenarios such as reading comprehension where multiple items may share a reading passage; 2) they are not scalable since storing one model per item becomes difficult when models have a large number of parameters. In this paper, we report our (grand prize-winning) solution to the National Assessment of Education Progress (NAEP) automated scoring challenge for reading comprehension. Our approach, in-context BERT fine-tuning, produces a single shared scoring model for all items with a carefully-designed input structure to provide contextual information on each item. We demonstrate the effectiveness of our approach via local evaluations using the training dataset provided by the challenge. We also discuss the biases, common error types, and limitations of our approach.


page 4

page 10


Tracing Origins: Coref-aware Machine Reading Comprehension

Machine reading comprehension is a heavily-studied research and test fie...

NLP-IIS@UT at SemEval-2021 Task 4: Machine Reading Comprehension using the Long Document Transformer

This paper presents a technical report of our submission to the 4th task...

Giving BERT a Calculator: Finding Operations and Arguments with Reading Comprehension

Reading comprehension models have been successfully applied to extractiv...

Automated Reading Passage Generation with OpenAI's Large Language Model

The widespread usage of computer-based assessments and individualized le...

ReCAM@IITK at SemEval-2021 Task 4: BERT and ALBERT based Ensemble for Abstract Word Prediction

This paper describes our system for Task 4 of SemEval-2021: Reading Comp...

MultiQA: An Empirical Investigation of Generalization and Transfer in Reading Comprehension

A large number of reading comprehension (RC) datasets has been created r...

Matching Exemplar as Next Sentence Prediction (MeNSP): Zero-shot Prompt Learning for Automatic Scoring in Science Education

Developing models to automatically score students' written responses to ...

Please sign up or login with your details

Forgot password? Click here to reset