The Types, Roles, and Practices of Documentation in Data Analytics Open Source Software Libraries: A Collaborative Ethnography of Documentation Work

05/31/2018
by   R. Stuart Geiger, et al.
0

Computational research and data analytics increasingly relies on complex ecosystems of open source software (OSS) "libraries" -- curated collections of reusable code that programmers import to perform a specific task. Software documentation for these libraries is crucial in helping programmers/analysts know what libraries are available and how to use them. Yet documentation for open source software libraries is widely considered low-quality. This article is a collaboration between CSCW researchers and contributors to data analytics OSS libraries, based on ethnographic fieldwork and qualitative interviews. We examine several issues around the formats, practices, and challenges around documentation in these largely volunteer-based projects. There are many different kinds and formats of documentation that exist around such libraries, which play a variety of educational, promotional, and organizational roles. The work behind documentation is similarly multifaceted, including writing, reviewing, maintaining, and organizing documentation. Different aspects of documentation work require contributors to have different sets of skills and overcome various social and technical barriers. Finally, most of our interviewees do not report high levels of intrinsic enjoyment for doing documentation work (compared to writing code). Their motivation is affected by personal and project-specific factors, such as the perceived level of credit for doing documentation work versus more "technical" tasks like adding new features or fixing bugs. In studying documentation work for data analytics OSS libraries, we gain a new window into the changing practices of data-intensive research, as well as help practitioners better understand how to support this often invisible and infrastructural work in their projects.

READ FULL TEXT

page 11

page 19

research
08/16/2020

Elmer FEM-Dakota: A unified open-source computational framework for electromagnetics and data analytics

Open-source electromagnetic design software, Elmer FEM, was interfaced w...
research
12/09/2020

From One to Hundreds: Multi-Licensing in the JavaScript Ecosystem

Open source licenses create a legal framework that plays a crucial role ...
research
09/14/2017

Weld: Rethinking the Interface Between Data-Intensive Applications

Data analytics applications combine multiple functions from different li...
research
02/10/2023

A Graph-Based Modeling Framework for Tracing Hydrological Pollutant Transport in Surface Waters

Anthropogenic pollution of hydrological systems affects diverse communit...
research
12/19/2022

An overview of open source Deep Learning-based libraries for Neuroscience

In recent years, deep learning revolutionized machine learning and its a...
research
04/06/2021

Logging Practices with Mobile Analytics: An Empirical Study on Firebase

Software logs are of great value in both industrial and open-source proj...
research
08/20/2022

Data Centred Intelligent Geosciences: Research Agenda and Opportunities, Position Paper

This paper describes and discusses our vision to develop and reason abou...

Please sign up or login with your details

Forgot password? Click here to reset