Adding Context to Source Code Representations for Deep Learning

07/30/2022
by   Fuwei Tian, et al.
0

Deep learning models have been successfully applied to a variety of software engineering tasks, such as code classification, summarisation, and bug and vulnerability detection. In order to apply deep learning to these tasks, source code needs to be represented in a format that is suitable for input into the deep learning model. Most approaches to representing source code, such as tokens, abstract syntax trees (ASTs), data flow graphs (DFGs), and control flow graphs (CFGs) only focus on the code itself and do not take into account additional context that could be useful for deep learning models. In this paper, we argue that it is beneficial for deep learning models to have access to additional contextual information about the code being analysed. We present preliminary evidence that encoding context from the call hierarchy along with information from the code itself can improve the performance of a state-of-the-art deep learning model for two software engineering tasks. We outline our research agenda for adding further contextual information to source code representations for deep learning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/23/2020

A Simple Approach for Handling Out-of-Vocabulary Identifiers in Deep Learning for Source Code

There is an emerging interest in the application of deep learning models...
research
06/21/2021

A Mocktail of Source Code Representations

Efficient representation of source code is essential for various softwar...
research
09/15/2021

A Comparison of Code Embeddings and Beyond

Program representation learning is a fundamental task in software engine...
research
06/27/2022

Incivility Detection in Open Source Code Review and Issue Discussions

Given the democratic nature of open source development, code review and ...
research
06/27/2022

Deep-Learning vs Regression: Prediction of Tourism Flow with Limited Data

Modern tourism in the 21st century is facing numerous challenges. One of...
research
09/17/2019

Ludwig: a type-based declarative deep learning toolbox

In this work we present Ludwig, a flexible, extensible and easy to use t...
research
03/03/2019

CodeGRU: Context-aware Deep Learning with Gated Recurrent Unit for Source Code Modeling

Recently many NLP-based deep learning models have been applied to model ...

Please sign up or login with your details

Forgot password? Click here to reset