DocCoder: Generating Code by Retrieving and Reading Docs

07/13/2022
by   Shuyan Zhou, et al.
0

Natural-language-to-code models learn to generate a code snippet given a natural language (NL) intent. However, the rapid growth of both publicly available and proprietary libraries and functions makes it impossible to cover all APIs using training examples, as new libraries and functions are introduced daily. Thus, existing models inherently cannot generalize to using unseen functions and libraries merely through incorporating them into the training data. In contrast, when human programmers write programs, they frequently refer to textual resources such as code manuals, documentation, and tutorials, to explore and understand available library functionality. Inspired by this observation, we introduce DocCoder: an approach that explicitly leverages code manuals and documentation by (1) retrieving the relevant documentation given the NL intent, and (2) generating the code based on the NL intent and the retrieved documentation. Our approach is general, can be applied to any programming language, and is agnostic to the underlying neural model. We demonstrate that DocCoder consistently improves NL-to-code models: DocCoder achieves 11x higher exact match accuracy than strong baselines on a new Bash dataset tldr; on the popular Python CoNaLa benchmark, DocCoder improves over strong baselines by 1.65 BLEU.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/16/2022

Code Generation for Unknown Libraries via Reading API Documentations

Open-domain code generation is a challenging problem because the set of ...
research
12/19/2022

Asking Clarification Questions for Code Generation in General-Purpose Programming Language

Code generation from text requires understanding the user's intent from ...
research
08/11/2021

Natural Language-Guided Programming

In today's software world with its cornucopia of reusable software libra...
research
09/16/2022

Code as Policies: Language Model Programs for Embodied Control

Large language models (LLMs) trained on code completion have been shown ...
research
08/11/2022

Interactive Code Generation via Test-Driven User-Intent Formalization

Pre-trained large language models (LLMs) such as OpenAI Codex have shown...
research
12/06/2020

NaturalCC: A Toolkit to Naturalize the Source Code Corpus

We present NaturalCC, an efficient and extensible toolkit to bridge the ...
research
04/12/2021

Generating Code with the Help of Retrieved Template Functions and Stack Overflow Answers

We approach the important challenge of code autocompletion as an open-do...

Please sign up or login with your details

Forgot password? Click here to reset