Microsecond Consensus for Microsecond Applications

by   Marcos K. Aguilera, et al.

We consider the problem of making apps fault-tolerant through replication, when apps operate at the microsecond scale, as in finance, embedded computing, and microservices apps. These apps need a replication scheme that also operates at the microsecond scale, otherwise replication becomes a burden. We propose Mu, a system that takes less than 1.3 microseconds to replicate a (small) request in memory, and less than a millisecond to fail-over the system - this cuts the replication and fail-over latencies of the prior systems by at least 61 Mu implements bona fide state machine replication/consensus (SMR) with strong consistency for a generic app, but it really shines on microsecond apps, where even the smallest overhead is significant. To provide this performance, Mu introduces a new SMR protocol that carefully leverages RDMA. Roughly, in Mu a leader replicates a request by simply writing it directly to the log of other replicas using RDMA, without any additional communication. Doing so, however, introduces the challenge of handling concurrent leaders, changing leaders, garbage collecting the logs, and more - challenges that we address in this paper through a judicious combination of RDMA permissions and distributed algorithmic design. We implemented Mu and used it to replicate several systems: a financial exchange app called Liquibook, Redis, Memcached, and HERD. Our evaluation shows that Mu incurs a small replication latency, in some cases being the only viable replication system that incurs an acceptable overhead.


page 6

page 7

page 15

page 16

page 17

page 19

page 20

page 21


Leader Confirmation Replication for Millisecond Consensus in Geo-distributed Private Chains

Geo-distributed private chain and database have created higher performan...

Stream-based State-Machine Replication

Developing state-machine replication protocols for practical use is a co...

Linearizable State Machine Replication of State-Based CRDTs without Logs

General solutions of state machine replication have to ensure that all r...

Replicating Persistent Memory Key-Value Stores with Efficient RDMA Abstraction

Combining persistent memory (PM) with RDMA is a promising approach to pe...

PigPaxos: Devouring the communication bottlenecks in distributed consensus

Paxos family of protocols are employed by many cloud computing services ...

Making Reads in BFT State Machine Replication Fast, Linearizable, and Live

Practical Byzantine Fault Tolerance (PBFT) is a seminal state machine re...

TeaMPI – Replication-based Resilience without the (Performance) Pain

In an era where we can not afford to checkpoint frequently, replication ...

Please sign up or login with your details

Forgot password? Click here to reset