An Empirical Study of Flaky Tests in Python

by   Martin Gruber, et al.

Tests that cause spurious failures without any code changes, i.e., flaky tests, hamper regression testing, increase maintenance costs, may shadow real bugs, and decrease trust in tests. While the prevalence and importance of flakiness is well established, prior research focused on Java projects, thus raising the question of how the findings generalize. In order to provide a better understanding of the role of flakiness in software development beyond Java, we empirically study the prevalence, causes, and degree of flakiness within software written in Python, one of the currently most popular programming languages. For this, we sampled 22352 open source projects from the popular PyPI package index, and analyzed their 876186 test cases for flakiness. Our investigation suggests that flakiness is equally prevalent in Python as it is in Java. The reasons, however, are different: Order dependency is a much more dominant problem in Python, causing 59 dataset. Another 28 represent a previously undocumented cause of flakiness. The remaining 13 mostly be attributed to the use of network and randomness APIs by the projects, which is indicative of the type of software commonly written in Python. Our data also suggests that finding flaky tests requires more runs than are often done in the literature: A 95 on average would require 170 reruns.


Method Chaining Redux: An Empirical Study of Method Chaining in Java, Kotlin, and Python

There are possible benefits and drawbacks to chaining methods together, ...

An Empirical Study of Flaky Tests in JavaScript

Flaky tests (tests with non-deterministic outcomes) can be problematic f...

Understanding Resolution of Multi-Language Bugs: An Empirical Study on Apache Projects

Background: In modern software systems, more and more systems are writte...

On the Effect of Instrumentation on Test Flakiness

Test flakiness is a problem that affects testing and processes that rely...

Smells in System User Interactive Tests

Test smells are known as bad development practices that reflect poor des...

FlaPy: Mining Flaky Python Tests at Scale

Flaky tests obstruct software development, and studying and proposing mi...

Design Smell Analysis for Developing and Established Open Source Java Software

Software design smells are design attributes which violate the fundament...

Please sign up or login with your details

Forgot password? Click here to reset