20-MAD – 20 Years of Issues and Commits of Mozilla and Apache Development

03/31/2020
by   Maëlick Claes, et al.
0

Data of long-lived and high profile projects is valuable for research on successful software engineering in the wild. Having a dataset with different linked software repositories of such projects, enables deeper diving investigations. This paper presents 20-MAD, a dataset linking the commit and issue data of Mozilla and Apache projects. It includes over 20 years of information about 765 projects, 3.4M commits, 2.3M issues, and 17.3M issue comments, and its compressed size is over 6 GB. The data contains all the typical information about source code commits (e.g., lines added and removed, message and commit time) and issues (status, severity, votes, and summary). The issue comments have been pre-processed for natural language processing and sentiment analysis. This includes emoticons and valence and arousal scores. Linking code repository and issue tracker information, allows studying individuals in two types of repositories and provide more accurate time zone information for issue trackers as well. To our knowledge, this the largest linked dataset in size and in project lifetime that is not based on GitHub.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/20/2022

An Alternative Issue Tracking Dataset of Public Jira Repositories

Organisations use issue tracking systems (ITSs) to track and document th...
research
10/04/2021

Label it be! A large-scale study of issue labeling in modern open-source repositories

In a wave of growth, open-source projects need to modernize and change h...
research
06/02/2020

Descriptions of issues and comments for predicting issue success in software projects

Software development tasks must be performed successfully to achieve sof...
research
02/16/2021

Improved dependency management for issue trackers in large collaborative projects

Issue trackers, such as Jira, have become the prevalent collaborative to...
research
11/16/2020

Linking Publications to Funding at Project Level: A curated dataset of publications reported by FP7 projects

Datasets explicitly linking publications to funding at project level are...
research
06/21/2021

An Exploratory Study on Architectural Knowledge in Issue Tracking Systems

Software developers use issue trackers (e.g. Jira) to manage defects, bu...
research
03/30/2020

Repository for Reusing Artifacts of Artificial Neural Networks

Artificial Neural Networks (ANNs) replaced conventional software systems...

Please sign up or login with your details

Forgot password? Click here to reset