Reshi: Recommending Resources for Scientific Workflow Tasks on Heterogeneous Infrastructures

by   Jonathan Bader, et al.

Scientific workflows typically comprise a multitude of different processing steps which often are executed in parallel on different partitions of the input data. These executions, in turn, must be scheduled on the compute nodes of the computational infrastructure at hand. This assignment is complicated by the facts that (a) tasks typically have highly heterogeneous resource requirements and (b) in many infrastructures, compute nodes offer highly heterogeneous resources. In consequence, predictions of the runtime of a given task on a given node, as required by many scheduling algorithms, are often rather imprecise, which can lead to sub-optimal scheduling decisions. We propose Reshi, a method for recommending task-node assignments during workflow execution that can cope with heterogeneous tasks and heterogeneous nodes. Reshi approaches the problem as a regression task, where task-node pairs are modeled as feature vectors over the results of dedicated micro benchmarks and past task executions. Based on these features, Reshi trains a regression tree model to rank and recommend nodes for each ready-to-run task, which can be used as input to a scheduler. For our evaluation, we benchmarked 27 AWS machine types using three representative workflows. We compare Reshi's recommendations with three state-of-the-art schedulers. Our evaluation shows that Reshi outperforms HEFT by a mean makespan reduction of 7.18 mean task runtime prediction error of 15


Lotaru: Locally Estimating Runtimes of Scientific Workflow Tasks in Heterogeneous Clusters

Many scientific workflow scheduling algorithms need to be informed about...

Tarema: Adaptive Resource Allocation for Scalable Scientific Workflows in Heterogeneous Clusters

Scientific workflow management systems like Nextflow support large-scale...

Task Runtime Prediction in Scientific Workflows Using an Online Incremental Learning Approach

Many algorithms in workflow scheduling and resource provisioning rely on...

Lotaru: Locally Predicting Workflow Task Runtimes for Resource Management on Heterogeneous Infrastructures

Many resource management techniques for task scheduling, energy and carb...

Leveraging Reinforcement Learning for Task Resource Allocation in Scientific Workflows

Scientific workflows are designed as directed acyclic graphs (DAGs) and ...

BottleMod: Modeling Data Flows and Tasks for Fast Bottleneck Analysis

In the recent years, scientific workflows gained more and more popularit...

High performance scheduling of mixed-mode DAGs on heterogeneous multicores

Many HPC applications can be expressed as mixed-mode computations, in wh...

Please sign up or login with your details

Forgot password? Click here to reset