CARBON: A Counterfactual Reasoning based Framework for Neural Code Comprehension Debiasing

by   Shuzheng Gao, et al.

Previous studies have demonstrated that code intelligence models are sensitive to program transformation among which identifier renaming is particularly easy to apply and effective. By simply renaming one identifier in source code, the models would output completely different results. The prior research generally mitigates the problem by generating more training samples. Such an approach is less than ideal since its effectiveness depends on the quantity and quality of the generated samples. Different from these studies, we are devoted to adjusting models for explicitly distinguishing the influence of identifier names on the results, called naming bias in this paper, and thereby making the models robust to identifier renaming. Specifically, we formulate the naming bias with a structural causal model (SCM), and propose a counterfactual reasoning based framework named CARBON for eliminating the naming bias in neural code comprehension. CARBON explicitly captures the naming bias through multi-task learning in the training stage, and reduces the bias by counterfactual inference in the inference stage. We evaluate CARBON on three neural code comprehension tasks, including function naming, defect detection and code classification. Experiment results show that CARBON achieves relatively better performance (e.g., +0.5 score) than the baseline models on the original benchmark datasets, and significantly improvement (e.g., +37.9 score) on the datasets with identifiers renamed. The proposed framework provides a causal view for improving the robustness of code intelligence models.


page 1

page 2

page 3

page 4


Code Comprehension Confounders: A Study of Intelligence and Personal

Literature and intuition suggest that a developer's intelligence and per...

Efficient Classification with Counterfactual Reasoning and Active Learning

Data augmentation is one of the most successful techniques to improve th...

Debiasing Stance Detection Models with Counterfactual Reasoning and Adversarial Bias Learning

Stance detection models may tend to rely on dataset bias in the text par...

Leveraging Artificial Intelligence on Binary Code Comprehension

Understanding binary code is an essential but complex software engineeri...

Counterfactual Multihop QA: A Cause-Effect Approach for Reducing Disconnected Reasoning

Multi-hop QA requires reasoning over multiple supporting facts to answer...

Counterfactual Adversarial Learning with Representation Interpolation

Deep learning models exhibit a preference for statistical fitting over l...

Causal Inference for Chatting Handoff

Aiming to ensure chatbot quality by predicting chatbot failure and enabl...

Please sign up or login with your details

Forgot password? Click here to reset