Towards General Distributed Resource Selection

by   Ming Tai Ha, et al.

The advantages of distributing workloads and utilizing multiple distributed resources are now well established. The type and degree of heterogeneity of distributed resources is increasing, and thus determining how to distribute the workloads becomes increasingly difficult, in particular with respect to the selection of suitable resources. We formulate and investigate the resource selection problem in a way that it is agnostic of specific task and resource properties, and which is generalizable to range of metrics. Specifically, we developed a model to describe the requirements of tasks and to estimate the cost of running that task on an arbitrary resource using baseline measurements from a reference machine. We integrated our cost model with the Condor matchmaking algorithm to enable resource selection. Experimental validation of our model shows that it provides execution time estimates with 157-171 on XSEDE resources and 18-31 model to select resources for a bag-of-tasks of up to 1024 GROMACS MD simulations across the target resources. Experiments show that using the model's estimates reduces the workload's time-to-completion up to 85 compared to the random distribution of workload across the same resources.


page 1

page 2

page 3

page 4


Hardware Abstractions and Hardware Mechanisms to Support Multi-Task Execution on Coarse-Grained Reconfigurable Arrays

Domain-specific accelerators are used in various computing systems rangi...

Task Placement and Resource Allocation for Edge Machine Learning: A GNN-based Multi-Agent Reinforcement Learning Paradigm

Machine learning (ML) tasks are one of the major workloads in today's ed...

Pilot-Edge: Distributed Resource Management Along the Edge-to-Cloud Continuum

Many science and industry IoT applications necessitate data processing a...

Gridiron: A Technique for Augmenting Cloud Workloads with Network Bandwidth Requirements

Cloud applications use more than just server resources, they also requir...

Technical Report: A Trace-Based Performance Study of Autoscaling Workloads of Workflows in Datacenters

To improve customer experience, datacenter operators offer support for s...

WISE: A Computer System Performance Index Scoring Framework

The performance levels of a computing machine running a given workload c...

Compiler-Guided Throughput Scheduling for Many-core Machines

Modern ARM-based servers such as ThunderX and ThunderX2 offer a tremendo...

Please sign up or login with your details

Forgot password? Click here to reset