Distilling ChatGPT for Explainable Automated Student Answer Assessment

by   Jiazheng Li, et al.

Assessing student answers and providing valuable feedback is crucial for effective learning, but it can be a time-consuming task. Traditional methods of automating student answer assessment through text classification often suffer from issues such as lack of trustworthiness, transparency, and the ability to provide a rationale for the automated assessment process. These limitations hinder their usefulness in practice. In this paper, we explore using ChatGPT, a cutting-edge large language model, for the concurrent tasks of student answer scoring and rationale generation under both the zero-shot and few-shot settings. We introduce a critic module which automatically filters incorrect outputs from ChatGPT and utilizes the remaining ChtaGPT outputs as noisy labelled data to fine-tune a smaller language model, enabling it to perform student answer scoring and rationale generation. Moreover, by drawing multiple samples from ChatGPT outputs, we are able to compute predictive confidence scores, which in turn can be used to identify corrupted data and human label errors in the training set. Our experimental results demonstrate that despite being a few orders of magnitude smaller than ChatGPT, the fine-tuned language model achieves better performance in student answer scoring. Furthermore, it generates more detailed and comprehensible assessments than traditional text classification methods. Our approach provides a viable solution to achieve explainable automated assessment in education.


page 3

page 5


Matching Exemplar as Next Sentence Prediction (MeNSP): Zero-shot Prompt Learning for Automatic Scoring in Science Education

Developing models to automatically score students' written responses to ...

Real-Time Automated Answer Scoring

In recent years, the role of big data analytics has exponentially grown ...

ChatGPT: Beginning of an End of Manual Linguistic Data Annotation? Use Case of Automatic Genre Identification

ChatGPT has shown strong capabilities in natural language generation tas...

Short Answer Grading Using One-shot Prompting and Text Similarity Scoring Model

In this study, we developed an automated short answer grading (ASAG) mod...

Zero-shot NLG evaluation through Pairware Comparisons with LLMs

Evaluating Natural Language Generation (NLG) outputs is crucial but labo...

DisCo: Distilled Student Models Co-training for Semi-supervised Text Mining

Many text mining models are constructed by fine-tuning a large deep pre-...

ProtSi: Prototypical Siamese Network with Data Augmentation for Few-Shot Subjective Answer Evaluation

Subjective answer evaluation is a time-consuming and tedious task, and t...

Please sign up or login with your details

Forgot password? Click here to reset