CodeCoT and Beyond: Learning to Program and Test like a Developer

by   Dong Huang, et al.

In natural language processing, transformer-based large language models (LLMs) like GPT-x models developed by OpenAI have revolutionized the landscape. Despite their impressive capabilities, these models often encounter challenges when handling tasks that differ from their training data, resulting in compromised performance. To address this, few-shot learning has emerged as a valuable technique, allowing LLMs to adapt with minimal task-specific data. One innovative strategy, known as Chain-of-Thought Prompting (CoT), has been introduced to guide LLMs in revealing cognitive processes during multi-step reasoning. In this paper, we propose Code Chain-of-Thought (CodeCoT), which consists of two components: the Vanilla CodeCoT and the Self-exam CodeCoT. The latter incorporates self-examination, empowering the model to iteratively generate code, formulate test cases, and refine its outputs. Specifically, the process entails the generation of test examples by the model corresponding to the code it is tasked to implement. If it fails on the test examples, then it regenerates the code based on the erroneous code and associated error types. Through comprehensive experiments, we observed that both techniques significantly enhance code generation accuracy across various LLM variants. Our evaluation results reveal that CodeCoT improves the code generation effectiveness, including an unprecedented pass@1 accuracy of 79.27% using the Self-exam CodeCoT approach on the gpt-3.5-turbo-0613 model in the HumanEval dataset.


page 1

page 2

page 3

page 4


Exploring an LM to generate Prolog Predicates from Mathematics Questions

Recently, there has been a surge in interest in NLP driven by ChatGPT. C...

Improving ChatGPT Prompt for Code Generation

Automated code generation can be a powerful technique for software devel...

Stay on topic with Classifier-Free Guidance

Classifier-Free Guidance (CFG) has recently emerged in text-to-image gen...

Training Models to Generate, Recognize, and Reframe Unhelpful Thoughts

Many cognitive approaches to well-being, such as recognizing and reframi...

Self-Polish: Enhance Reasoning in Large Language Models via Problem Refinement

Prompting methods such as Chain-of-Thought (CoT) have shed new light on ...

Design of Chain-of-Thought in Math Problem Solving

Chain-of-Thought (CoT) plays a crucial role in reasoning for math proble...

Symmetry-Preserving Program Representations for Learning Code Semantics

Large Language Models (LLMs) have shown promise in automated program rea...

Please sign up or login with your details

Forgot password? Click here to reset