On the Time-Based Conclusion Stability of Software Defect Prediction Models

11/14/2019
by   Abdul Ali Bangash, et al.
0

Researchers in empirical software engineering often make claims based on observable data such as defect reports. Unfortunately, in many cases, these claims are generalized beyond the data sets that have been evaluated. Will the researcher's conclusions hold a year from now for the same software projects? Perhaps not. Recent studies show that in the area of Software Analytics, conclusions over different data sets are usually inconsistent. In this article, we empirically investigate whether conclusions in the area of defect prediction truly exhibit stability throughout time or not. Our investigation applies a time-aware evaluation approach where models are trained only on the past, and evaluations are executed only on the future. Through this time-aware evaluation, we show that depending on which time period we evaluate defect predictors, their performance, in terms of F-Score, the area under the curve (AUC), and Mathews Correlation Coefficient (MCC), varies and their results are not consistent. The next release of a product, which is significantly different from its prior release, may drastically change defect prediction performance. Therefore, without knowing about the conclusion stability, empirical software engineering researchers should limit their claims of performance within the contexts of evaluation, because broad claims about defect prediction performance might be contradicted by the next upcoming release of a product under analysis.

READ FULL TEXT

page 13

page 14

page 17

research
05/23/2021

Data Quality in Empirical Software Engineering: A Targeted Review

Context: The utility of prediction models in empirical software engineer...
research
08/21/2020

Revisiting Process versus Product Metrics: a Large Scale Analysis

Numerous methods can build predictive models from software data. But wha...
research
01/27/2021

An extensive empirical study of inconsistent labels in multi-version-project defect data sets

The label quality of defect data sets has a direct influence on the reli...
research
06/11/2021

A Taxonomy of Data Quality Challenges in Empirical Software Engineering

Reliable empirical models such as those used in software effort estimati...
research
01/14/2021

Evaluating prediction systems in software project estimation

Context: Software engineering has a problem in that when we empirically ...
research
05/24/2021

The Early Bird Catches the Worm: Better Early Life Cycle Defect Predictors

Before researchers rush to reason across all available data, they should...
research
08/05/2023

LLM is Like a Box of Chocolates: the Non-determinism of ChatGPT in Code Generation

There has been a recent explosion of research on Large Language Models (...

Please sign up or login with your details

Forgot password? Click here to reset