Tailored Learning-Based Scheduling for Kubernetes-Oriented Edge-Cloud System

01/17/2021
by   Yiwen Han, et al.
0

Kubernetes (k8s) has the potential to merge the distributed edge and the cloud but lacks a scheduling framework specifically for edge-cloud systems. Besides, the hierarchical distribution of heterogeneous resources and the complex dependencies among requests and resources make the modeling and scheduling of k8s-oriented edge-cloud systems particularly sophisticated. In this paper, we introduce KaiS, a learning-based scheduling framework for such edge-cloud systems to improve the long-term throughput rate of request processing. First, we design a coordinated multi-agent actor-critic algorithm to cater to decentralized request dispatch and dynamic dispatch spaces within the edge cluster. Second, for diverse system scales and structures, we use graph neural networks to embed system state information, and combine the embedding results with multiple policy networks to reduce the orchestration dimensionality by stepwise scheduling. Finally, we adopt a two-time-scale scheduling mechanism to harmonize request dispatch and service orchestration, and present the implementation design of deploying the above algorithms compatible with native k8s components. Experiments using real workload traces show that KaiS can successfully learn appropriate scheduling policies, irrespective of request arrival patterns and system scales. Moreover, KaiS can enhance the average system throughput rate by 14.3 cost by 34.7

READ FULL TEXT

page 1

page 2

page 5

research
05/10/2023

Collaborative Learning-Based Scheduling for Kubernetes-Oriented Edge-Cloud Network

Kubernetes (k8s) has the potential to coordinate distributed edge resour...
research
03/20/2022

EdgeMatrix: A Resources Redefined Edge-Cloud System for Prioritized Services

The edge-cloud system has the potential to combine the advantages of het...
research
08/01/2023

EdgeMatrix: A Resource-Redefined Scheduling Framework for SLA-Guaranteed Multi-Tier Edge-Cloud Computing Systems

With the development of networking technology, the computing system has ...
research
10/25/2020

LazyBatching: An SLA-aware Batching System for Cloud Machine Learning Inference

In cloud ML inference systems, batching is an essential technique to inc...
research
08/13/2018

A Reference Architecture for Datacenter Scheduling: Extended Technical Report

Datacenters act as cloud-infrastructure to stakeholders across industry,...
research
05/16/2021

DRAS-CQSim: A Reinforcement Learning based Framework for HPC Cluster Scheduling

For decades, system administrators have been striving to design and tune...
research
01/28/2020

A Scheduling Algorithm to Maximize Storm Throughput in Heterogeneous Cluster

In the most popular distributed stream processing frameworks (DSPFs), pr...

Please sign up or login with your details

Forgot password? Click here to reset