A Closer Look into Transformer-Based Code Intelligence Through Code Transformation: Challenges and Opportunities

07/09/2022
by   Yaoxian Li, et al.
0

Transformer-based models have demonstrated state-of-the-art performance in many intelligent coding tasks such as code comment generation and code completion. Previous studies show that deep learning models are sensitive to the input variations, but few studies have systematically studied the robustness of Transformer under perturbed input code. In this work, we empirically study the effect of semantic-preserving code transformation on the performance of Transformer. Specifically, 24 and 27 code transformation strategies are implemented for two popular programming languages, Java and Python, respectively. For facilitating analysis, the strategies are grouped into five categories: block transformation, insertion/deletion transformation, grammatical statement transformation, grammatical token transformation, and identifier transformation. Experiments on three popular code intelligence tasks, including code completion, code summarization and code search, demonstrate insertion/deletion transformation and identifier transformation show the greatest impact on the performance of Transformer. Our results also suggest that Transformer based on abstract syntax trees (ASTs) shows more robust performance than the model based on only code sequence under most code transformations. Besides, the design of positional encoding can impact the robustness of Transformer under code transformation. Based on our findings, we distill some insights about the challenges and opportunities for Transformer-based code intelligence.

READ FULL TEXT
research
08/03/2021

An Empirical Study on the Usage of Transformer Models for Code Completion

Code completion aims at speeding up code writing by predicting the next ...
research
04/05/2022

An Exploratory Study on Code Attention in BERT

Many recent models in software engineering introduced deep neural models...
research
03/06/2020

TranS^3: A Transformer-based Framework for Unifying Code Summarization and Code Search

Code summarization and code search have been widely adopted in sofwarede...
research
10/25/2019

Selective Lambda Lifting

Lambda lifting is a well-known transformation, traditionally employed fo...
research
05/16/2020

IntelliCode Compose: Code Generation Using Transformer

In software development through integrated development environments (IDE...
research
01/15/2020

Insertion-Deletion Transformer

We propose the Insertion-Deletion Transformer, a novel transformer-based...
research
12/20/2022

ReCode: Robustness Evaluation of Code Generation Models

Code generation models have achieved impressive performance. However, th...

Please sign up or login with your details

Forgot password? Click here to reset