Can 100 Machines Agree?

by   Rachid Guerraoui, et al.

Agreement protocols have been typically deployed at small scale, e.g., using three to five machines. This is because these protocols seem to suffer from a sharp performance decay. More specifically, as the size of a deployment—i.e., degree of replication—increases, the protocol performance greatly decreases. There is not much experimental evidence for this decay in practice, however, notably for larger system sizes, e.g., beyond a handful of machines. In this paper we execute agreement protocols on up to 100 machines and observe on their performance decay. We consider well-known agreement protocols part of mature systems, such as Apache ZooKeeper, etcd, and BFT-Smart, as well as a chain and a novel ring-based agreement protocol which we implement ourselves. We provide empirical evidence that current agreement protocols execute gracefully on 100 machines. We observe that throughput decay is initially sharp (consistent with previous observations); but intriguingly—as each system grows beyond a few tens of replicas—the decay dampens. For chain- and ring-based replication, this decay is slower than for the other systems. The positive takeaway from our evaluation is that mature agreement protocol implementations can sustain out-of-the-box 300 to 500 requests per second when executing on 100 replicas on a wide-area public cloud platform. Chain- and ring-based replication can reach between 4K and 11K (up to 20x improvements) depending on the fault assumptions.


page 1

page 2

page 3

page 4


Matchmaker Paxos: A Reconfigurable Consensus Protocol [Technical Report]

State machine replication protocols, like MultiPaxos and Raft, are at th...

Bipartisan Paxos: A Modular State Machine Replication Protocol

There is no shortage of state machine replication protocols. From Genera...

Linearizable Replicated State Machines with Lattice Agreement

This paper studies the lattice agreement problem in asynchronous systems...

Cross-Chain State Machine Replication

This paper considers the classical state machine replication (SMR) probl...

Scaling Replicated State Machines with Compartmentalization [Technical Report]

State machine replication protocols, like MultiPaxos and Raft, are a cri...

Cryptanalysis of the DHDP and EGDP protocols over E_p^(m)

In this paper we break the protocol based on the Diffie-Hellman Decompos...

Behaviorally Typed State Machines in TypeScript for Heterogeneous Swarms

A heterogeneous swarm system is a distributed system where participants ...

Please sign up or login with your details

Forgot password? Click here to reset