RepoCoder: Repository-Level Code Completion Through Iterative Retrieval and Generation

03/22/2023
by   Fengji Zhang, et al.
0

The task of repository-level code completion is to continue writing the unfinished code based on a broader context of the repository. While for automated code completion tools, it is difficult to utilize the useful information scattered in different files. We propose RepoCoder, a simple, generic, and effective framework to address the challenge. It streamlines the repository-level code completion process by incorporating a similarity-based retriever and a pre-trained code language model, which allows for the effective utilization of repository-level information for code completion and grants the ability to generate code at various levels of granularity. Furthermore, RepoCoder utilizes a novel iterative retrieval-generation paradigm that bridges the gap between retrieval context and the intended completion target. We also propose a new benchmark RepoEval, which consists of the latest and high-quality real-world repositories covering line, API invocation, and function body completion scenarios. We test the performance of RepoCoder by using various combinations of code retrievers and generators. Experimental results indicate that RepoCoder significantly improves the zero-shot code completion baseline by over 10 retrieval-augmented code completion approach. Furthermore, we validate the effectiveness of RepoCoder through comprehensive analysis, providing valuable insights for future research.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/05/2023

RepoBench: Benchmarking Repository-Level Code Auto-Completion Systems

Large Language Models (LLMs) have greatly advanced code auto-completion ...
research
03/15/2022

ReACC: A Retrieval-Augmented Code Completion Framework

Code completion, which aims to predict the following code token(s) accor...
research
06/19/2023

RepoFusion: Training Code Models to Understand Your Repository

Despite the huge success of Large Language Models (LLMs) in coding assis...
research
12/29/2020

Multi-task Learning based Pre-trained Language Model for Code Completion

Code completion is one of the most useful features in the Integrated Dev...
research
06/26/2022

Repository-Level Prompt Generation for Large Language Models of Code

With the success of large language models (LLMs) of code and their use a...
research
07/28/2023

Private-Library-Oriented Code Generation with Large Language Models

Large language models (LLMs), such as Codex and GPT-4, have recently sho...
research
06/01/2023

Better Context Makes Better Code Language Models: A Case Study on Function Call Argument Completion

Pretrained code language models have enabled great progress towards prog...

Please sign up or login with your details

Forgot password? Click here to reset