Towards Understanding the Impacts of Textual Dissimilarity on Duplicate Bug Report Detection

12/20/2022
by   Sigma Jahan, et al.
0

About 40 major overhead during software maintenance. Traditional techniques often focus on detecting duplicate bug reports that are textually similar. However, in bug tracking systems, many duplicate bug reports might not be textually similar, for which the traditional techniques might fall short. In this paper, we conduct a large-scale empirical study to better understand the impacts of textual dissimilarity on the detection of duplicate bug reports. First, we collect a total of 92,854 bug reports from three open-source systems and construct two datasets containing textually similar and textually dissimilar duplicate bug reports. Then we determine the performance of three existing techniques in detecting duplicate bug reports and show that their performance is significantly poor for textually dissimilar duplicate reports. Second, we analyze the two groups of bug reports using a combination of descriptive analysis, word embedding visualization, and manual analysis. We found that textually dissimilar duplicate bug reports often miss important components (e.g., expected behaviors and steps to reproduce), which could lead to their textual differences and poor performance by the existing techniques. Finally, we apply domain-specific embedding to duplicate bug report detection problems, which shows mixed results. All these findings above warrant further investigation and more effective solutions for detecting textually dissimilar duplicate bug reports.

READ FULL TEXT
research
04/09/2018

Using Categorical Features in Mining Bug Tracking Systems to Assign Bug Reports

Most bug assignment approaches utilize text classification and informati...
research
12/13/2022

Auto-labelling of Bug Report using Natural Language Processing

The exercise of detecting similar bug reports in bug tracking systems is...
research
08/17/2023

A Comparative Study of Text Embedding Models for Semantic Text Similarity in Bug Reports

Bug reports are an essential aspect of software development, and it is c...
research
03/12/2018

Are Donation Badges Appealing? A Case Study of Developer Responses to Eclipse Bug Reports

Eclipse, an open source software project, acknowledges its donors by pre...
research
09/29/2018

Towards Better Summarizing Bug Reports with Crowdsourcing Elicited Attributes

Recent years have witnessed the growing demands for resolving numerous b...
research
08/19/2023

Cupid: Leveraging ChatGPT for More Accurate Duplicate Bug Report Detection

Duplicate bug report detection (DBRD) is a long-standing challenge in bo...
research
10/29/2018

SMT-Based Refutation of Spurious Bug Reports in the Clang Static Analyzer

We describe and evaluate a bug refutation extension for the Clang Static...

Please sign up or login with your details

Forgot password? Click here to reset