Reproducibility of Machine Learning: Terminology, Recommendations and Open Issues
Reproducibility is one of the core dimensions that concur to deliver Trustworthy Artificial Intelligence. Broadly speaking, reproducibility can be defined as the possibility to reproduce the same or a similar experiment or method, thereby obtaining the same or similar results as the original scientists. It is an essential ingredient of the scientific method and crucial for gaining trust in relevant claims. A reproducibility crisis has been recently acknowledged by scientists and this seems to affect even more Artificial Intelligence and Machine Learning, due to the complexity of the models at the core of their recent successes. Notwithstanding the recent debate on Artificial Intelligence reproducibility, its practical implementation is still insufficient, also because many technical issues are overlooked. In this survey, we critically review the current literature on the topic and highlight the open issues. Our contribution is three-fold. We propose a concise terminological review of the terms coming into play. We collect and systematize existing recommendations for achieving reproducibility, putting forth the means to comply with them. We identify key elements often overlooked in modern Machine Learning and provide novel recommendations for them. We further specialize these for two critical application domains, namely the biomedical and physical artificial intelligence fields.
READ FULL TEXT