In-IDE Code Generation from Natural Language: Promise and Challenges

01/27/2021
by   Frank F. Xu, et al.
0

A great part of software development involves conceptualizing or communicating the underlying procedures and logic that needs to be expressed in programs. One major difficulty of programming is turning concept into code, especially when dealing with the APIs of unfamiliar libraries. Recently, there has been a proliferation of machine learning methods for code generation and retrieval from natural language queries, but these have primarily been evaluated purely based on retrieval accuracy or overlap of generated code with developer-written code, and the actual effect of these methods on the developer workflow is surprisingly unattested. We perform the first comprehensive investigation of the promise and challenges of using such technology inside the IDE, asking "at the current state of technology does it improve developer productivity or accuracy, how does it affect the developer experience, and what are the remaining gaps and challenges?" We first develop a plugin for the IDE that implements a hybrid of code generation and code retrieval functionality, and orchestrate virtual environments to enable collection of many user events. We ask developers with various backgrounds to complete 14 Python programming tasks ranging from basic file manipulation to machine learning or data visualization, with or without the help of the plugin. While qualitative surveys of developer experience are largely positive, quantitative results with regards to increased productivity, code quality, or program correctness are inconclusive. Analysis identifies several pain points that could improve the effectiveness of future machine learning based code generation/retrieval developer assistants, and demonstrates when developers prefer code generation over code retrieval and vice versa. We release all data and software to pave the road for future empirical studies and development of better models.

READ FULL TEXT

page 5

page 6

page 9

page 10

research
05/16/2023

The Good, the Bad, and the Missing: Neural Code Generation for Machine Learning Tasks

Machine learning (ML) has been increasingly used in a variety of domains...
research
12/02/2017

Will humans even write code in 2040 and what would that mean for extreme heterogeneity in computing?

Programming trends suggest that software development will undergo a radi...
research
08/26/2021

Retrieval Augmented Code Generation and Summarization

Software developers write a lot of source code and documentation during ...
research
02/14/2023

The Programmer's Assistant: Conversational Interaction with a Large Language Model for Software Development

Large language models (LLMs) have recently been applied in software engi...
research
03/02/2023

Deep Learning Based Code Generation Methods: A Literature Review

Code Generation aims at generating relevant code fragments according to ...
research
06/27/2022

BashExplainer: Retrieval-Augmented Bash Code Comment Generation based on Fine-tuned CodeBERT

Developers use shell commands for many tasks, such as file system manage...
research
07/12/2023

Exploring the Effectiveness of LLMs in Automated Logging Generation: An Empirical Study

Automated logging statement generation techniques facilitate developers ...

Please sign up or login with your details

Forgot password? Click here to reset