Cost-effective BlackWater Raft on Highly Unreliable Nodes at Scale Out

03/15/2022
by   Zichen Xu, et al.
0

The Raft algorithm maintains strong consistency across data replicas in Cloud. This algorithm divides nodes into leaders and followers, to satisfy read/write requests spanning geo-diverse sites. With the increase of workload, Raft shall provide scale-out performance in proportion. However, traditional scale-out techniques encounter bottlenecks in Raft, and when the provisioned sites exhaust local resources, the performance loss will grow exponentially. To provide scalability in Raft, this paper proposes a cost-effective mechanism for elastic auto-scaling in Raft, called BlackWater-Raft or BW-Raft. BW-Raft extends the original Raft with the following abstractions: (1) secretary nodes that take over expensive log synchronization operations from the leader, relaxing the performance constraints on locks. (2) massive low cost observer nodes that handle reads only, improving throughput for typical data intensive services. These abstractions are stateless, allowing elastic scale-out on unreliable yet cheap spot instances. In theory, we demonstrate that BW-Raft can maintain Raft's strong consistency guarantees when scaling out, processing a 50X increase in the number of nodes compared to the original Raft. We have prototyped the BW-Raft on key-value services and evaluated it with many state-of-the-arts on Amazon EC2 and Alibaba Cloud. Our results show that within the same budget, BW-Raft's resource footprint increments are 5-7X smaller than Multi-Raft, and 2X better than original Raft. Using spot instances, BW-Raft can reduces costs by 84.5% compared to Multi-Raft. In the real world experiments, BW-Raft improves goodput of the 95th-percentile SLO by 9.4X, thus serving as an alternative for services scaling out with strong consistency.

READ FULL TEXT
research
05/29/2020

AI-based Resource Allocation: Reinforcement Learning for Adaptive Auto-scaling in Serverless Environments

Serverless computing has emerged as a compelling new paradigm of cloud c...
research
02/13/2018

Elastic Provisioning of Cloud Caches: a Cost-aware TTL Approach

We consider elastic resource provisioning in the cloud, focusing on in-m...
research
02/25/2019

PaRiS: Causally Consistent Transactions with Non-blocking Reads and Partial Replication

Geo-replicated data platforms are at the backbone of several large-scale...
research
09/20/2023

A Cost-Aware Mechanism for Optimized Resource Provisioning in Cloud Computing

Due to the recent wide use of computational resources in cloud computing...
research
09/04/2020

"Reduction of Monetary Cost in Cloud Storage System by Using Extended Strict Timed Causal Consistency"

Cloud storage systems have been introduced to provide a scalable, secure...
research
08/03/2020

A simple and effective predictive resource scaling heuristic for large-scale cloud applications

We propose a simple yet effective policy for the predictive auto-scaling...
research
03/04/2020

Moving the California distributed CMS xcache from bare metal into containers using Kubernetes

The University of California system has excellent networking between all...

Please sign up or login with your details

Forgot password? Click here to reset