DC-Check: A Data-Centric AI checklist to guide the development of reliable machine learning systems

by   Nabeel Seedat, et al.

While there have been a number of remarkable breakthroughs in machine learning (ML), much of the focus has been placed on model development. However, to truly realize the potential of machine learning in real-world settings, additional aspects must be considered across the ML pipeline. Data-centric AI is emerging as a unifying paradigm that could enable such reliable end-to-end pipelines. However, this remains a nascent area with no standardized framework to guide practitioners to the necessary data-centric considerations or to communicate the design of data-centric driven ML systems. To address this gap, we propose DC-Check, an actionable checklist-style framework to elicit data-centric considerations at different stages of the ML pipeline: Data, Training, Testing, and Deployment. This data-centric lens on development aims to promote thoughtfulness and transparency prior to system development. Additionally, we highlight specific data-centric AI challenges and research opportunities. DC-Check is aimed at both practitioners and researchers to guide day-to-day development. As such, to easily engage with and use DC-Check and associated resources, we provide a DC-Check companion website (https://www.vanderschaar-lab.com/dc-check/). The website will also serve as an updated resource as methods and tooling evolve over time.


page 2

page 15


Data-centric Artificial Intelligence

Data-centric artificial intelligence (data-centric AI) represents an eme...

DataPerf: Benchmarks for Data-Centric AI Development

Machine learning (ML) research has generally focused on models, while th...

Towards Data-centric Graph Machine Learning: Review and Outlook

Data-centric AI, with its primary focus on the collection, management, a...

DMOps: Data Management Operation and Recipes

Data-centric AI has shed light on the significance of data within the ma...

Data-Centric AI Requires Rethinking Data Notion

The transition towards data-centric AI requires revisiting data notions ...

Data-centric Operational Design Domain Characterization for Machine Learning-based Aeronautical Products

We give a first rigorous characterization of Operational Design Domains ...

Data-Centric Machine Learning Approach for Early Ransomware Detection and Attribution

Researchers have proposed a wide range of ransomware detection and analy...

Please sign up or login with your details

Forgot password? Click here to reset