Semantic Similarity Metrics for Evaluating Source Code Summarization

04/04/2022
by   Sakib Haque, et al.
0

Source code summarization involves creating brief descriptions of source code in natural language. These descriptions are a key component of software documentation such as JavaDocs. Automatic code summarization is a prized target of software engineering research, due to the high value summaries have to programmers and the simultaneously high cost of writing and maintaining documentation by hand. Current work is almost all based on machine models trained via big data input. Large datasets of examples of code and summaries of that code are used to train an e.g. encoder-decoder neural model. Then the output predictions of the model are evaluated against a set of reference summaries. The input is code not seen by the model, and the prediction is compared to a reference. The means by which a prediction is compared to a reference is essentially word overlap, calculated via a metric such as BLEU or ROUGE. The problem with using word overlap is that not all words in a sentence have the same importance, and many words have synonyms. The result is that calculated similarity may not match the perceived similarity by human readers. In this paper, we conduct an experiment to measure the degree to which various word overlap metrics correlate to human-rated similarity of predicted and reference summaries. We evaluate alternatives based on current work in semantic similarity metrics and propose recommendations for evaluation of source code summarization.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/07/2021

Action Word Prediction for Neural Source Code Summarization

Source code summarization is the task of creating short, natural languag...
research
08/14/2023

Semantic Similarity Loss for Neural Source Code Summarization

This paper presents an improved loss function for neural source code sum...
research
06/15/2021

Code to Comment Translation: A Comparative Study on Model Effectiveness Errors

Automated source code summarization is a popular software engineering re...
research
03/28/2023

Label Smoothing Improves Neural Source Code Summarization

Label smoothing is a regularization technique for neural networks. Norma...
research
04/10/2020

Improved Automatic Summarization of Subroutines via Attention to File Context

Software documentation largely consists of short, natural language summa...
research
07/15/2021

Neural Code Summarization: How Far Are We?

Source code summaries are important for the comprehension and maintenanc...
research
02/18/2020

Learning by Semantic Similarity Makes Abstractive Summarization Better

One of the obstacles of abstractive summarization is the presence of var...

Please sign up or login with your details

Forgot password? Click here to reset