Universal Representation for Code

03/04/2021
by   Linfeng Liu, et al.
4

Learning from source code usually requires a large amount of labeled data. Despite the possible scarcity of labeled data, the trained model is highly task-specific and lacks transferability to different tasks. In this work, we present effective pre-training strategies on top of a novel graph-based code representation, to produce universal representations for code. Specifically, our graph-based representation captures important semantics between code elements (e.g., control flow and data flow). We pre-train graph neural networks on the representation to extract universal code properties. The pre-trained model then enables the possibility of fine-tuning to support various downstream applications. We evaluate our model on two real-world datasets – spanning over 30M Java methods and 770K Python methods. Through visualization, we reveal discriminative properties in our universal code representation. By comparing multiple benchmarks, we demonstrate that the proposed framework achieves state-of-the-art results on method name prediction and code graph link prediction.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/17/2020

GCC: Graph Contrastive Coding for Graph Neural Network Pre-Training

Graph representation learning has emerged as a powerful technique for re...
research
05/31/2019

Pre-Training Graph Neural Networks for Generic Structural Feature Extraction

Graph neural networks (GNNs) are shown to be successful in modeling appl...
research
09/18/2023

Deep Prompt Tuning for Graph Transformers

Graph transformers have gained popularity in various graph-based tasks b...
research
03/13/2023

A Survey of Graph Prompting Methods: Techniques, Applications, and Challenges

While deep learning has achieved great success on various tasks, the tas...
research
11/01/2021

Multi network InfoMax: A pre-training method involving graph convolutional networks

Discovering distinct features and their relations from data can help us ...
research
09/10/2020

Learning Universal Representations from Word to Sentence

Despite the well-developed cut-edge representation learning for language...
research
10/14/2021

Omni-Training for Data-Efficient Deep Learning

Learning a generalizable deep model from a few examples in a short time ...

Please sign up or login with your details

Forgot password? Click here to reset