Transactional Python for Durable Machine Learning: Vision, Challenges, and Feasibility

05/15/2023
by   Supawit Chockchowwat, et al.
0

In machine learning (ML), Python serves as a convenient abstraction for working with key libraries such as PyTorch, scikit-learn, and others. Unlike DBMS, however, Python applications may lose important data, such as trained models and extracted features, due to machine failures or human errors, leading to a waste of time and resources. Specifically, they lack four essential properties that could make ML more reliable and user-friendly – durability, atomicity, replicability, and time-versioning (DART). This paper presents our vision of Transactional Python that provides DART without any code modifications to user programs or the Python kernel, by non-intrusively monitoring application states at the object level and determining a minimal amount of information sufficient to reconstruct a whole application. Our evaluation of a proof-of-concept implementation with public PyTorch and scikit-learn applications shows that DART can be offered with overheads ranging 1.5

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/03/2021

Machine Learning using Stata/Python

We present two related Stata modules, r_ml_stata and c_ml_stata, for fit...
research
05/04/2023

ExeKGLib: Knowledge Graphs-Empowered Machine Learning Analytics

Many machine learning (ML) libraries are accessible online for ML practi...
research
12/21/2021

PyTracer: Automatically profiling numerical instabilities in Python

Numerical stability is a crucial requirement of reliable scientific comp...
research
03/08/2023

Defectors: A Large, Diverse Python Dataset for Defect Prediction

Defect prediction has been a popular research topic where machine learni...
research
01/05/2023

Serenity: Library Based Python Code Analysis for Code Completion and Automated Machine Learning

Dynamically typed languages such as Python have become very popular. Amo...
research
02/13/2020

The PHOTON Wizard – Towards Educational Machine Learning Code Generators

Despite the tremendous efforts to democratize machine learning, especial...
research
01/03/2020

Towards Scalable Dataframe Systems

Dataframes are a popular and convenient abstraction to represent, struct...

Please sign up or login with your details

Forgot password? Click here to reset