Upload
franz
View
55
Download
0
Embed Size (px)
DESCRIPTION
Asynchronous Consensus. Ken Birman. Outline of talk. Reminder about models Asynchronous consensus: Impossibility result Solution to the problem With an “oracle” that detects failures Without oracles, using timeout Big issues? Revisit from Byzantine agreement - PowerPoint PPT Presentation
Citation preview
Asynchronous Consensus
Ken Birman
Outline of talk
Reminder about models Asynchronous consensus: Impossibility result Solution to the problem
With an “oracle” that detects failures Without oracles, using timeout
Big issues? Revisit from Byzantine agreement Is this model realistic? In what ways is it “legitimate”? Should we focus on impossibility, or “possibility”? Asynchronous consensus in real world systems
Distributed Computing Models
Recall that we had two models To reason about networks and applications
we need to be precise about the setting in which our protocols run
But “real world” networks are very complexThey can drop packets, or reorder themIntruders might be able to intercept and modify
dataTiming is totally unpredictable
Asynchronous network model
Asynchronous because we lack clocks: Network can arbitrarily delay a message But we assume that messages are sequenced and
retransmitted (arbitrary numbers of times), so they eventually get through.
“Free” to say: lossless, ordered No value to assumptions about process speed
Failures in asynchronous model? Usually, limited to process “crash” faults If detectable, we call this “fail-stop” – but how to detect?
An asynchronous network
Not causal!
An asynchronous network
Time shrinks…
An asynchronous network
Time shrinks…
Time stretches…
Justification?
If we can do something in the asynchronous model, we can probably do it even better in a real network Clocks, a-priori knowledge can only help…
But today we will focus on an impossibility result
By definition, impossibility in this model means “xxx can’t always be done”
Paradigms
Fundamental problems, the solution of which yields general insight into a broad class of questions
In distributed systems: Agreement (on value proposed by a leader) Consensus (everyone proposes a value… pick one) Electing a leader Atomic broadcast/multicast (send a message, reliably, to
everyone who isn’t faulty, such that concurrent messages are delivered in the same order everywhere)
Deadlock detection, clock or process synchronization, taking a snapshot (“picture”) of the system state….
Consensus problem
Models distributed agreement Comes in various forms (with subtle differences in the
associated results)! With a leader: leader gives an order, like “attack”, and non-
faulty participants either attack or do nothing, despite some limited number of failures: Byzantine Agreement
Without a leader: participants have an initial vote; protocol runs and eventually all non-faulty participants chose the same outcome, and it is one of the initial votes (typically, 0 or 1): Fault-tolerant Consensus
Consensus problem
P0
Q0
R1
P1
Q1
R1
Fault-tolerance
Goal: an algorithm tolerant of one failure Failure: process crashes but this is not
detectable So the algorithm must work both in the face
of arbitrary message delay caused by the network, and in the event of a single failure
If some process stays up…
Suppose we knew that P won’t fail Then P could simply broadcast it’s input All would “decide” upon this value Solves the problem
If one process stays up
Indeed, suppose that P stays up only long enough to send one message
But there is only one failure And we knew that P would “lead” Then we can relay P’s message, using an all-
to-all broadcast
Algorithm
P: broadcast my input Q P: on receiving P’s message for first
time, broadcast a copy Tolerates anything except failure of P in the
first step, but we need to agree upon “P” before starting (ie P is the least ranked process, using alphabetic ranking)
Another algorithm
All processes start by broadcasting own value to all other processes
If we know that there is always exactly one failure, could wait until n-1 messages received, then using any deterministic rule
But doesn’t work if sometimes we have one failure, sometimes none
FLP result
Considers general case Assumes an algorithm that can decide with
zero or one failures Proves that this algorithm can be prevented
from reaching decision, indefinitely
Basic idea
Think of system state as a “configuration” Configuration is v-valent if decision to pick v has
become inevitable: all runs lead to v If not 0-valent or 1-valent, configuration is bivalent
Initial configuration includes At least one 0-valent: {0,0,0….0} At least one 1-valent: {1,1,1…..1} At least one bivalent: {0,0,…1,1}
Basic idea
0-valentconfigurations
1-valentconfigurations
bi-valentconfigurations
Transitions between configurations
Configuration is a set of processes and messages Applying a message to a process changes its state,
hence it moves us to a new configuration Because the system is asynchronous, can’t predict
which of a set of concurrent messages will be delivered “next”
But because processes only communicate by messages, this is unimportant
Basic Lemma
Suppose that from some configuration C, the schedules 1, 2 lead to configurations C1 and C2, respectively.
If the sets of processes taking actions in 1 and 2, respectively, are disjoint than 2 can be applied to C1 and 1 to C2, and both lead to the same configuration C3
Basic Lemma
2
C1
C3
C
C2
2
1
1
Main result
No consensus protocol is totally correct in spite of one fault
Note: Uses total in formal sense (guarantee of termination)
Basic FLP theorem
Suppose we are in a bivalent configuration now and later will enter a univalent configuration
We can draw a form of frontier, such that a single message to a single process triggers the transition from bivalent to univalent
Basic FLP theorem
bivalent
univalent
e’
D0
D1
C
C1
e’
e
e
Single step decides
They prove that any run that goes from a bivalent state to a univalent state has a single decision step, e
They show that it is always possible to schedule events so as to block such steps
Eventually, e can be scheduled but in a state where it no longer triggers a decision
Basic FLP theorem
They show that we can delay this “magic message” and cause the system to take at least one step, remaining in a new bivalent configuration
Uses the diamond-relation seen earlier But this implies that in a bivalent state there are
runs of indefinite length that remain bivalent Proves the impossibility of fault-tolerant consensus
Notes on FLP
No failures actually occur in this run, just delayed messages
Result is purely abstract. What does it “mean”?
Says nothing about how probable this adversarial run might be, only that at least one such run exists
FLP intuition
Suppose that we start a system up with n processes Run for a while… close to picking value associated
with process “p” Someone will do this for the first time, presumably
on receiving some message from q If we delay that message, and yet our protocol is
“fault-tolerant”, it will somehow reconfigure Now allow the delayed message to get through but
delay some other message
Key insight
FLP is about forcing a system to attempt a form of reconfiguration
This takes time Each “unfortunate” suspected failure causes
such a reconfiguration
FLP and our first algorithm
P is the leader and is supposed to send its input to Q Q “times out” and
Tells everyone that P has apparently failed Then can disseminate its own value If P wakes up, we re-admit it to the system but it is no
longer considered least ranked One can make such algorithms work… But they can be attacked by delaying first P, then Q,
then R, etc
FLP in the real world
Real systems are subject to this impossibility result But in fact often are subject to even more severe
limitations, such as inability to tolerate network partition failures
Also, asynchronous consensus may be too slow for our taste
And FLP attack is not probable in a real system Requires a very smart adversary!
Chandra/Toueg
Showed that FLP applies to many problems, not just consensus In particular, they show that FLP applies to
group membership, reliable multicast So these practical problems are impossible in
asynchronous systems, in formal sense But they also look at the weakest condition
under which consensus can be solved
Chandra/Toueg Idea
Separate problem into The consensus algorithm itself A “failure detector:” a form of oracle that
announces suspected failure But it can change its mind
Question: what is the weakest oracle for which consensus is always solvable?
Sample properties
Completeness: detection of every crash Strong completeness: Eventually, every
process that crashes is permanently suspected by every correct process
Weak completeness: Eventually, every process that crashes is permanently suspected by some correct process
Sample properties
Accuracy: does it make mistakes? Strong accuracy: No process is suspected before it
crashes. Weak accuracy: Some correct process is never
suspected Eventual strong accuracy: there is a time after which
correct processes are not suspected by any correct process
Eventual weak accuracy: there is a time after which some correct process is not suspected by any correct process
A sampling of failure detectors
Completeness Accuracy
Strong Weak Eventually Strong Eventually Weak
Strong PerfectP
StrongS
Eventually Perfect
P
Eventually Strong S
Weak D WeakW
D Eventually Weak W
Perfect Detector?
Named Perfect, written P Strong completeness and strong accuracy Immediately detects all failures Never makes mistakes
Example of a failure detector
The detector they call W: “eventually weak” More commonly: W: “diamond-W” Defined by two properties:
There is a time after which every process that crashes is suspected by some correct process
There is a time after which some correct process is never suspected by any correct process
Think: “we can eventually agree upon a leader.” If it crashes, “we eventually, accurately detect the crash”
W: Weakest failure detector
They show that W is the weakest failure detector for which consensus is guaranteed to be achieved
Algorithm is pretty simple Rotate a token around a ring of processes Decision can occur once token makes it around once
without a change in failure-suspicion status for any process
Subsequently, as token is passed, each recipient learns the decision outcome
Rotating a token versus 2-phase commit
“phase”
Propose v… ack… Decide v
Rotating a token versus 2-phase commit
Their protocol is basically a 2-phase commit But with n processes, 2PC requires 2(n-1)
messages per phase, 3(n-1) total Passing a token only requires n messages per
phase, for 2n total (when nothing fails) Tolerates f < n/2 failures
Set of problems solvable in:
Clock synchronization
TRBnon-blocking atomic
commit
consensusatomic broadcast
reliablebroadcast
Synchronous systems
Asynchronous using P
Asynchronous using W
Asynchronous
TRB: Byzantine Generals with only crash failures
Building systems with W
Unfortunately, this failure detector is not implementable
Using timeouts we can make mistakes at arbitrary times
But with long enough timeouts, could produce a close approximation to W
Would we want to?
Question: are we solving the right problem? Pros and cons of asynchronous consensus Think about an air traffic control application
Find one problem for which asynchronous consensus is a good match
Find one problem for which the match is poor
French ATC system (simplified)
Controllers
Air Traffic Database (flight plans, etc)
X.500 Directory
Radar
Onboard
Potential applications
Maintaining replicated state within console clusters Distributing radar data to participants Distributing data over wide-area links within large
geographic scale Management and control (administration) of the
overall system Distributing security keys to prevent unauthorized
action Agreement when flight control handoffs occur
Broad conclusions?
The protocol seems unsuitable for high availability applications If the core of the system must make progress, the
agreement property itself is too strong If a process becomes unresponsive might not want to
wait for it to recover Also, since we can’t implement any of these failure
detectors, the whole issue is abstract… Hence real systems don’t try to solve consensus as
defined and used in these kinds of protocols!
Value of FLP/Consensus
A clear and elegant problem statement Highlights limitations
Perhaps with clocks we can overcome them More likely, we need a different notion of
failure “Crash failure” is too narrow, “unreachable”
also treated as failure in many real systems Caused much debate about real systems
Nature of debate
We’ll see many practical systems soon Do they
Evade FLP in some way? Are they subject to FLP? If so, what problem do they
“solve”, given that consensus (and most problems reduce to consensus) is impossible to solve?
Or are they subject to even more stringent limitations?
Is fault-tolerant consensus even an issue in real systems?