Exploring the Carbon Footprint of Hugging Face's ML Models: A Repository Mining Study

by   Joel Castaño, et al.

The rise of machine learning (ML) systems has exacerbated their carbon footprint due to increased capabilities and model sizes. However, there is scarce knowledge on how the carbon footprint of ML models is actually measured, reported, and evaluated. In light of this, the paper aims to analyze the measurement of the carbon footprint of 1,417 ML models and associated datasets on Hugging Face, which is the most popular repository for pretrained ML models. The goal is to provide insights and recommendations on how to report and optimize the carbon efficiency of ML models. The study includes the first repository mining study on the Hugging Face Hub API on carbon emissions. This study seeks to answer two research questions: (1) how do ML model creators measure and report carbon emissions on Hugging Face Hub?, and (2) what aspects impact the carbon emissions of training ML models? The study yielded several key findings. These include a decreasing proportion of carbon emissions-reporting models, a slight decrease in reported carbon footprint on Hugging Face over the past 2 years, and a continued dominance of NLP as the main application domain. Furthermore, the study uncovers correlations between carbon emissions and various attributes such as model size, dataset size, and ML application domains. These results highlight the need for software measurements to improve energy reporting practices and promote carbon-efficient model development within the Hugging Face community. In response to this issue, two classifications are proposed: one for categorizing models based on their carbon emission reporting practices and another for their carbon efficiency. The aim of these classification proposals is to foster transparency and sustainable model development within the ML community.


page 1

page 7

page 8


The Carbon Footprint of Machine Learning Training Will Plateau, Then Shrink

Machine Learning (ML) workloads have rapidly grown in importance, but ra...

Estimating the Carbon Footprint of BLOOM, a 176B Parameter Language Model

Progress in machine learning (ML) comes with a cost to the environment, ...

Counting Carbon: A Survey of Factors Influencing the Emissions of Machine Learning

Machine learning (ML) requires using energy to carry out computations du...

Carbontracker: Tracking and Predicting the Carbon Footprint of Training Deep Learning Models

Deep learning (DL) can achieve impressive results across a wide variety ...

Machine Learning practices and infrastructures

Machine Learning (ML) systems, particularly when deployed in high-stakes...

Exploring the Potential of Feature Density in Estimating Machine Learning Classifier Performance with Application to Cyberbullying Detection

In this research. we analyze the potential of Feature Density (HD) as a ...

Carbon Footprints on Inter-Domain Paths: Uncovering CO2 Tracks on Global Networks

In the years after signing the Paris agreement, corporations have been e...

Please sign up or login with your details

Forgot password? Click here to reset