The Data-Production Dispositif

by   Milagros Miceli, et al.

Machine learning (ML) depends on data to train and verify models. Very often, organizations outsource processes related to data work (i.e., generating and annotating data and evaluating outputs) through business process outsourcing (BPO) companies and crowdsourcing platforms. This paper investigates outsourced ML data work in Latin America by studying three platforms in Venezuela and a BPO in Argentina. We lean on the Foucauldian notion of dispositif to define the data-production dispositif as an ensemble of discourses, actions, and objects strategically disposed to (re)produce power/knowledge relations in data and labor. Our dispositif analysis comprises the examination of 210 data work instruction documents, 55 interviews with data workers, managers, and requesters, and participant observation. Our findings show that discourses encoded in instructions reproduce and normalize the worldviews of requesters. Precarious working conditions and economic dependency alienate workers, making them obedient to instructions. Furthermore, discourses and social contexts materialize in artifacts, such as interfaces and performance metrics, limiting workers' agency and normalizing specific ways of interpreting data. We conclude by stressing the importance of counteracting the data-production dispositif by fighting alienation and precarization, and empowering data workers to become assets in the quest for high-quality data.


page 8

page 11

page 15

page 17

page 18

page 34

page 35

page 36


Wisdom for the Crowd: Discoursive Power in Annotation Instructions for Computer Vision

Developers of computer vision algorithms outsource some of the labor inv...

Documenting Data Production Processes: A Participatory Approach for Data Work

The opacity of machine learning data is a significant threat to ethical ...

Studying Up Machine Learning Data: Why Talk About Bias When We Mean Power?

Research in machine learning (ML) has primarily argued that models train...

The Coloniality of Data Work in Latin America

This presentation for the AIES 21 doctoral consortium examines the Latin...

A Labeling Task Design for Supporting Algorithmic Needs: Facilitating Worker Diversity and Reducing AI Bias

Studies on supervised machine learning (ML) recommend involving workers ...

Universal Clustering via Crowdsourcing

Consider unsupervised clustering of objects drawn from a discrete set, t...

Please sign up or login with your details

Forgot password? Click here to reset