Upload
alia-stanforth
View
215
Download
2
Embed Size (px)
Citation preview
Distributed Systems Overview
Replicated State Machine (RSM)
• Distributed Systems 101– Fault-tolerance (partial, byzantine, recovery,...)– Concurrency (ordering, asynchrony, timing,...)
• Generic solution for distributed systems: Replicated State Machine approach– Represent your system with a deterministic state machine– Replicate the state machine– Feed input to all replicas in the same order
Total Order Reliable Broadcast aka Atomic Broadcast
• Reliable broadcast– All or none correct nodes get the message
(even if src fails)
• Atomic Broadcast– Reliable broadcast that guarantees:
All messages delivered in the same order
• Replicated state machine trivial with atomic broadcast
Consensus?• Consensus problem
– All nodes propose a value– All correct nodes must agree on one of the values– Must eventually reach a decision (availability)
• Atomic Broadcast → Consensus– Broadcast proposal, Decide on first received value
• Consensus → Atomic Broadcast– Unreliably broadcast message to all– 1 consensus per round: – propose set of messages seen but not delivered– Each round deliver one decided message
• Atomic Broadcast equivalent to Atomic Broadcast
Consensus impossible
• No deterministic 1-crash-robust consensus algorithm exists for the asynchronous model
• 1-crash-robust – Up to one node may crash
• Asynchronous model– No global clock– No bounded message delay
• Life after impossibility of consensus? What to do?
Solving Consensus with Failure Detectors
• Black box that tells us if a node has failed
• Perfect failure detector– Completeness
• It will eventually tell us if a node has failed
– Accuracy (no lying)• It will never tell us a node
has failed if it hasn’t
• Perfect FD → Consensus
xi = inputfor r:=1 to N do
if r=p thenforall j do send <val, xi, r>
to j;decide xi
if collect<val, x´, r> from r thenxi = x´;
enddecide xi
Solving Consensus
• Consensus → Perfect FD?– No. Don’t know if a node actually failed or not!
• What’s the weakest FD to solve consensus?– Least assumptions on top of asynchronous model!
Enter Omega• Leader Election
– Eventually every correct node trusts some correct node– Eventually no two correct nodes trust different correct nodes
• Failure detection and leader election are the same– Failure detection captures failure behavior
• detect failed nodes
– Leader election also captures failure behavior• Detect correct nodes (a single & same for all)
• Formally, leader election is an FD– Always suspects all nodes except one (leader)– Ensures some properties regarding that node
Weakest Failure Detector for Consensus
• Omega the weakest failure detector for consensus– How to prove it?– Easy to implement in practice
High Level View of Paxos• Elect a single proposer using Ω– Proposer imposes its proposal to everyone– Everyone decides– Done!
• Problem with Ω– Several nodes might initially be proposers (contention)
• Solution is abortable consensus – Proposer attempts to enforce decision– Might abort if there is contention (safety)– Ω ensures eventually 1 proposer succeeds (liveness)
10
Replicated State Machine
• Paxos approach (Lamport)– Client sends input to leader Paxos– Leader executes Paxos instance to agree on command– Well-understood, many papers, optimizations
• View-stamp approach (Liskov)– Have one leader that writes commands to a quorum
(no Paxos)– When failures happen, use Paxos to agree– Less understood (Mazieres tutorial)
Paxos Siblings
• Cheap Paxos (LM’04)– Fewer messages– Directly contact a quorum (e.g. 3 nodes out of 5)– If fail to get response from 3, expand to 5
• Fast Paxos (L’06)– Reduce from 3 delays to 2 delays (delays ~ delays)– Clients optimistically write to a quorum– Requires recovery
Paxos Siblings
• Gaios/SMARTER (Bolosky’11)– Make logging to disk efficient for crash-recovery– Uses pipelining and batching
• Generalized Paxos (LM’05)– Commutative operations for repl. state machine
Atomic Commit
• Atomic Commit– Commit IFF no failures and everyone votes commit– Else Abort
• Consensus on Transaction Commit (LG’04)– One Paxos instance for every TM– Only commit if every instance said Commit
Reconfigurable Paxos
• Change the set of nodes– Replace failed nodes– Add/remove new nodes (change size of quorum)
• Lamport’s idea– Part of the state of state-machine: set of nodes
• SMART (Eurosys’06)– Many problems (e.g. A,B,C->A,B,D and A fails)– Basic idea, run multiple Paxos instances side by side