Plan-based Job Scheduling for Supercomputers with Shared Burst Buffers

08/31/2021
by   Jan Kopanski, et al.
0

The ever-increasing gap between compute and I/O performance in HPC platforms, together with the development of novel NVMe storage devices (NVRAM), led to the emergence of the burst buffer concept - an intermediate persistent storage layer logically positioned between random-access main memory and a parallel file system. Despite the development of real-world architectures as well as research concepts, resource and job management systems, such as Slurm, provide only marginal support for scheduling jobs with burst buffer requirements, in particular ignoring burst buffers when backfilling. We investigate the impact of burst buffer reservations on the overall efficiency of online job scheduling for common algorithms: First-Come-First-Served (FCFS) and Shortest-Job-First (SJF) EASY-backfilling. We evaluate the algorithms in a detailed simulation with I/O side effects. Our results indicate that the lack of burst buffer reservations in backfilling may significantly deteriorate scheduling. We also show that these algorithms can be easily extended to support burst buffers. Finally, we propose a burst-buffer-aware plan-based scheduling algorithm with simulated annealing optimisation, which improves the mean waiting time by over 20 SJF-EASY-backfilling.

READ FULL TEXT
research
09/29/2021

Optimisation of job scheduling for supercomputers with burst buffers

The ever-increasing gap between compute and I/O performance in HPC platf...
research
08/18/2021

ROME: A Multi-Resource Job Scheduling Framework for Exascale HPC Systems

High-performance computing (HPC) is undergoing significant changes. Next...
research
05/18/2020

Semi-online Scheduling: A Survey

In online scheduling, jobs are available one by one and each job must be...
research
09/16/2020

Extending SLURM for Dynamic Resource-Aware Adaptive Batch Scheduling

With the growing constraints on power budget and increasing hardware fai...
research
10/14/2022

Probabilistic Scheduling of Dynamic I/O Requests via Application Clustering for Burst-Buffer Equipped HPC

Burst-Buffering is a promising storage solution that introduces an inter...
research
08/31/2021

A log-linear (2+5/6)-approximation algorithm for parallel machine scheduling with a single orthogonal resource

As the gap between compute and I/O performance tends to grow, modern Hig...
research
05/07/2015

Development of a Burst Buffer System for Data-Intensive Applications

Modern parallel filesystems such as Lustre are designed to provide high,...

Please sign up or login with your details

Forgot password? Click here to reset