Exploring the Responses of Large Language Models to Beginner Programmers' Help Requests

by   Arto Hellas, et al.

Background and Context: Over the past year, large language models (LLMs) have taken the world by storm. In computing education, like in other walks of life, many opportunities and threats have emerged as a consequence. Objectives: In this article, we explore such opportunities and threats in a specific area: responding to student programmers' help requests. More specifically, we assess how good LLMs are at identifying issues in problematic code that students request help on. Method: We collected a sample of help requests and code from an online programming course. We then prompted two different LLMs (OpenAI Codex and GPT-3.5) to identify and explain the issues in the students' code and assessed the LLM-generated answers both quantitatively and qualitatively. Findings: GPT-3.5 outperforms Codex in most respects. Both LLMs frequently find at least one actual issue in each student program (GPT-3.5 in 90 cases). Neither LLM excels at finding all the issues (GPT-3.5 finding them 57 of the time). False positives are common (40 that the LLMs provide on the issues is often sensible. The LLMs perform better on issues involving program logic rather than on output formatting. Model solutions are frequently provided even when the LLM is prompted not to. LLM responses to prompts in a non-English language are only slightly worse than responses to English prompts. Implications: Our results continue to highlight the utility of LLMs in programming education. At the same time, the results highlight the unreliability of LLMs: LLMs make some of the same mistakes that students do, perhaps especially when formatting output as required by automated assessment systems. Our study informs teachers interested in using LLMs as well as future efforts to customize LLMs for the needs of programming education.


page 1

page 2

page 3

page 4


Exploring the Potential of Large Language Models to Generate Formative Programming Feedback

Ever since the emergence of large language models (LLMs) and related app...

Automatic Generation of Programming Exercises and Code Explanations using Large Language Models

This article explores the natural language generation capabilities of la...

Students Struggle to Explain Their Own Program Code

We asked students to explain the structure and execution of their small ...

Large Language Models in Introductory Programming Education: ChatGPT's Performance and Implications for Assessments

This paper investigates the performance of the Large Language Models (LL...

Promptly: Using Prompt Problems to Teach Learners How to Effectively Utilize AI Code Generators

With their remarkable ability to generate code, large language models (L...

ChatGPT is Good but Bing Chat is Better for Vietnamese Students

This study examines the efficacy of two SOTA large language models (LLMs...

Please sign up or login with your details

Forgot password? Click here to reset