Hybrid Job-driven Scheduling for Virtual MapReduce Clusters

08/24/2018
by   Ming-Chang Lee, et al.
0

It is cost-efficient for a tenant with a limited budget to establish a virtual MapReduce cluster by renting multiple virtual private servers (VPSs) from a VPS provider. To provide an appropriate scheduling scheme for this type of computing environment, we propose in this paper a hybrid job-driven scheduling scheme (JoSS for short) from a tenant's perspective. JoSS provides not only job level scheduling, but also map-task level scheduling and reduce-task level scheduling. JoSS classifies MapReduce jobs based on job scale and job type and designs an appropriate scheduling policy to schedule each class of jobs. The goal is to improve data locality for both map tasks and reduce tasks, avoid job starvation, and improve job execution performance. Two variations of JoSS are further introduced to separately achieve a better map-data locality and a faster task assignment. We conduct extensive experiments to evaluate and compare the two variations with current scheduling algorithms supported by Hadoop. The results show that the two variations outperform the other tested algorithms in terms of map-data locality, reduce-data locality, and network overhead without incurring significant overhead. In addition, the two variations are separately suitable for different MapReduce-workload scenarios and provide the best job performance among all tested algorithms.

READ FULL TEXT

page 9

page 10

page 11

page 13

research
02/05/2019

Low-latency job scheduling with preemption for the development of deep learning

One significant challenge in the job scheduling of computing clusters fo...
research
07/03/2019

CloudCoaster: Transient-aware Bursty Datacenter Workload Scheduling

Today's clusters often have to divide resources among a diverse set of j...
research
05/11/2023

Scheduling Multi-Server Jobs with Sublinear Regrets via Online Learning

Nowadays, multi-server jobs, which request multiple computing devices an...
research
03/04/2019

Workflow Scheduling in the Cloud with Weighted Upward-rank Priority Scheme Using Random Walk and Uniform Spare Budget Splitting

We study a difficult problem of how to schedule complex workflows with p...
research
09/23/2022

Optimal Job Scheduling and Bandwidth Augmentation in Hybrid Data Center Networks

Optimizing data transfers is critical for improving job performance in d...
research
11/27/2017

On the Optimality of Scheduling Dependent MapReduce Tasks on Heterogeneous Machines

MapReduce is the most popular big-data computation framework, motivating...
research
05/09/2017

Affinity Scheduling and the Applications on Data Center Scheduling with Data Locality

MapReduce framework is the de facto standard in Hadoop. Considering the ...

Please sign up or login with your details

Forgot password? Click here to reset