An Empirical Analysis of the R Package Ecosystem

02/19/2021
by   Ethan Bommarito, et al.
0

In this research, we present a comprehensive, longitudinal empirical summary of the R package ecosystem, including not just CRAN, but also Bioconductor and GitHub. We analyze more than 25,000 packages, 150,000 releases, and 15 million files across two decades, providing comprehensive counts and trends for common metrics across packages, releases, authors, licenses, and other important metadata. We find that the historical growth of the ecosystem has been robust under all measures, with a compound annual growth rate of 29 packages, 28 similar social systems, we find a number of highly right-skewed distributions with practical implications, including the distribution of releases per package, packages and releases per author or maintainer, package and maintainer dependency in-degree, and size per package and release. For example, the top five packages are imported by nearly 25 maintainers support packages that are imported by over half of all packages. We also highlight the dynamic nature of the ecosystem, recording both dramatic acceleration and notable deceleration in the growth of R. From a licensing perspective, we find a notable majority of packages are distributed under copyleft licensing or omit licensing information entirely. The data, methods, and calculations herein provide an anchor for public discourse and industry decisions related to R and CRAN, serving as a foundation for future research on the R software ecosystem and "data science" more broadly.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset