Self-stabilizing Total-order Broadcast
The problem of total-order (uniform reliable) broadcast is fundamental in fault-tolerant distributed computing since it abstracts a broad set of problems requiring processes to uniformly deliver messages in the same order in which they were sent. Existing solutions (that tolerate process failures) reduce the total-order broadcast problem to the one of multivalued consensus. Our study aims at the design of an even more reliable solution. We do so through the lenses of self-stabilization-a very strong notion of fault tolerance. In addition to node and communication failures, self-stabilizing algorithms can recover after the occurrence of arbitrary transient faults; these faults represent any violation of the assumptions according to which the system was designed to operate (as long as the algorithm code stays intact). This work proposes the first (to the best of our knowledge) self-stabilizing algorithm for total-order (uniform reliable) broadcast for asynchronous message-passing systems prone to process failures and transient faults. As we show, the proposed solution facilitates the elegant construction of self-stabilizing state-machine replication using bounded memory.
READ FULL TEXT