A Systematic Review of Unsupervised Learning Techniques for Software Defect Prediction

07/28/2019
by   Ning Li, et al.
0

Background: Unsupervised machine learners have been increasingly applied to software defect prediction. It is an approach that may be valuable for software practitioners because it reduces the need for labeled training data. Objective: Investigate the use and performance of unsupervised learning techniques in software defect prediction. Method: We conducted a systematic literature review that identified 48 studies containing 2348 individual experimental results, which satisfied our inclusion criteria published between January 2000 and March 2018. In order to compare prediction performance across these studies in a consistent way, we (re-)computed the confusion matrices and employed Matthew's correlation coefficient (MCC) as our main performance measure. Results: Our meta-analysis shows that unsupervised models are comparable with supervised models for both within-project and cross-project prediction. Among 21 unsupervised models, Fuzzy CMeans (FCM) and Fuzzy SOMs (FSOMs) perform best. In addition, where we were able to check, we found that almost 11 published results (contained in 16 papers) were internally inconsistent and a further 30 Conclusion: Although many factors impact the performance of a classifier, e.g., dataset characteristics, broadly speaking, unsupervised classifiers do not seem to perform worse than the supervised classifiers in our review. However, we note a worrying prevalence of (i) demonstrably erroneous experimental results, (ii) undemanding benchmarks and (iii) incomplete reporting. We particularly encourage researchers to be comprehensive in their reporting.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/10/2019

The Prevalence of Errors in Machine Learning Experiments

Context: Conducting experiments is central to research machine learning ...
research
03/02/2020

Assessing Software Defection Prediction Performance: Why Using the Matthews Correlation Coefficient Matters

Context: There is considerable diversity in the range and design of comp...
research
12/13/2020

Predicting Software Effort from Use Case Points: A Systematic Review

Context: Predicting software project effort from Use Case Points (UCP) m...
research
08/19/2019

Revisiting Heterogeneous Defect Prediction: How Far Are We?

Until now, researchers have proposed several novel heterogeneous defect ...
research
06/01/2023

Adversarial Robustness in Unsupervised Machine Learning: A Systematic Review

As the adoption of machine learning models increases, ensuring robust mo...
research
05/28/2018

An empirical study of public data quality problems in cross project defect prediction

Background: Two public defect data, including Jureczko and NASA datasets...
research
04/14/2020

Fidelity of Statistical Reporting in 10 Years of Cyber Security User Studies

Studies in socio-technical aspects of security often rely on user studie...

Please sign up or login with your details

Forgot password? Click here to reset