DaphneSched: A Scheduler for Integrated Data Analysis Pipelines

08/03/2023
by   Ahmed Eleliemy, et al.
0

DAPHNE is a new open-source software infrastructure designed to address the increasing demands of integrated data analysis (IDA) pipelines, comprising data management (DM), high performance computing (HPC), and machine learning (ML) systems. Efficiently executing IDA pipelines is challenging due to their diverse computing characteristics and demands. Therefore, IDA pipelines executed with the DAPHNE infrastructure require an efficient and versatile scheduler to support these demands. This work introduces DaphneSched, the task-based scheduler at the core of DAPHNE. DaphneSched is versatile by incorporating eleven task partitioning and three task assignment techniques, bringing the state-of-the-art closer to the state-of-the-practice task scheduling. To showcase DaphneSched's effectiveness in scheduling IDA pipelines, we evaluate its performance on two applications: a product recommendation system and a linear regression model training. We conduct performance experiments on multicore platforms with 20 and 56 cores. The results show that the versatility of DaphneSched enabled combinations of scheduling strategies that outperform commonly used scheduling techniques by up to 13 efficient execution of applications with IDA pipelines.

READ FULL TEXT

page 16

page 17

research
06/09/2021

LB4OMP: A Dynamic Load Balancing Library for Multithreaded Applications

Exascale computing systems will exhibit high degrees of hierarchical par...
research
11/07/2021

Plumber: Diagnosing and Removing Performance Bottlenecks in Machine Learning Data Pipelines

Input pipelines, which ingest and transform input data, are an essential...
research
04/13/2020

Intelligent Orchestration of ADAS Pipelines on Next Generation Automotive Platforms

Advanced Driver-Assistance Systems (ADAS) is one of the primary drivers ...
research
04/16/2020

Developing and Deploying Machine Learning Pipelines against Real-Time Image Streams from the PACS

Executing machine learning (ML) pipelines on radiology images is hard du...
research
08/05/2021

JITA4DS: Disaggregated execution of Data Science Pipelines between the Edge and the Data Centre

This paper targets the execution of data science (DS) pipelines supporte...
research
10/04/2022

Integrating pre-processing pipelines in ODC based framework

Using on-demand processing pipelines to generate virtual geospatial prod...
research
05/14/2021

Methods Included: Standardizing Computational Reuse and Portability with the Common Workflow Language

A widely used standard for portable multilingual data analysis pipelines...

Please sign up or login with your details

Forgot password? Click here to reset