CoaCor: Code Annotation for Code Retrieval with Reinforcement Learning

by   Ziyu Yao, et al.

To accelerate software development, much research has been performed to help people understand and reuse the huge amount of available code resources. Two important tasks have been widely studied: code retrieval, which aims to retrieve code snippets relevant to a given natural language query from a code base, and code annotation, where the goal is to annotate a code snippet with a natural language description. Despite their advancement in recent years, the two tasks are mostly explored separately. In this work, we investigate a novel perspective of Code annotation for Code retrieval (hence called `CoaCor'), where a code annotation model is trained to generate a natural language annotation that can represent the semantic meaning of a given code snippet and can be leveraged by a code retrieval model to better distinguish relevant code snippets from others. To this end, we propose an effective framework based on reinforcement learning, which explicitly encourages the code annotation model to generate annotations that can be used for the retrieval task. Through extensive experiments, we show that code annotations generated by our framework are much more detailed and more useful for code retrieval, and they can further improve the performance of existing code retrieval models significantly.


page 1

page 2

page 3

page 4


Leveraging Code Generation to Improve Code Retrieval and Summarization via Dual Learning

Code summarization generates brief natural language description given a ...

CRaDLe: Deep Code Retrieval Based on Semantic Dependency Learning

Code retrieval is a common practice for programmers to reuse existing co...

Adversarial Training for Code Retrieval with Question-Description Relevance Regularization

Code retrieval is a key task aiming to match natural and programming lan...

Cascaded Fast and Slow Models for Efficient Semantic Code Search

The goal of natural language semantic code search is to retrieve a seman...

Deeply Integrating C11 Code Support into Isabelle/PIDE

We present a framework for C code in C11 syntax deeply integrated into t...

OCoR: An Overlapping-Aware Code Retriever

Code retrieval helps developers reuse the code snippet in the open-sourc...

The KIT Motion-Language Dataset

Linking human motion and natural language is of great interest for the g...

Please sign up or login with your details

Forgot password? Click here to reset