Code Soliloquies for Accurate Calculations in Large Language Models

by   Shashank Sonkar, et al.

High-quality conversational datasets are integral to the successful development of Intelligent Tutoring Systems (ITS) that employ a Large Language Model (LLM) backend. These datasets, when used to fine-tune the LLM backend, significantly enhance the quality of interactions between students and ITS. A common strategy for developing these datasets involves generating synthetic student-teacher dialogues using advanced GPT-4 models. However, challenges arise when these dialogues demand complex calculations, common in subjects like physics. Despite its advanced capabilities, GPT-4's performance falls short in reliably handling even simple multiplication tasks, marking a significant limitation in its utility for these subjects. To address these challenges, this paper introduces an innovative stateful prompt design. Our approach generates a mock conversation between a student and a tutorbot, both roles simulated by GPT-4. Each student response triggers a soliloquy (an inner monologue) in the GPT-tutorbot, which assesses whether its response would necessitate calculations. If so, it proceeds to script the required code in Python and then uses the resulting output to construct its response to the student. Our approach notably enhances the quality of synthetic conversation datasets, especially for subjects that are calculation-intensive. Our findings show that our Higgs model – a LLaMA finetuned with datasets generated through our novel stateful prompt design – proficiently utilizes Python for computations. Consequently, finetuning with our datasets enriched with code soliloquies enhances not just the accuracy but also the computational reliability of Higgs' responses.


page 1

page 2

page 3

page 4


Automatic Short Math Answer Grading via In-context Meta-learning

Automatic short answer grading is an important research direction in the...

Using Language Models to Detect Alarming Student Responses

This article details the advances made to a system that uses artificial ...

Using language models in the implicit automated assessment of mathematical short answer items

We propose a new way to assess certain short constructed responses to ma...

Exploring Self-Reinforcement for Improving Learnersourced Multiple-Choice Question Explanations with Large Language Models

Learnersourcing involves students generating and sharing learning resour...

Response-Anticipated Memory for On-Demand Knowledge Integration in Response Generation

Neural conversation models are known to generate appropriate but non-inf...

An In-depth Investigation of User Response Simulation for Conversational Search

Conversational search has seen increased recent attention in both the IR...

DiMS: Distilling Multiple Steps of Iterative Non-Autoregressive Transformers

The computational benefits of iterative non-autoregressive transformers ...

Please sign up or login with your details

Forgot password? Click here to reset