View
218
Download
1
Tags:
Embed Size (px)
Citation preview
1
Failure Detectors: A Perspective
Sam Toueg
LIX, Ecole Polytechnique
Cornell University
2
Context: Distributed Systems with Failures
• Group Membership
• Group Communication
• Atomic Broadcast
• Primary/Backup systems
• Atomic Commitment
• Consensus
• Leader Election
• …..
In such systems, applications often need to determine which processes are up (operational) and which are down (crashed)
This service is provided by Failure Detector (FD)
FDs are at the core of many fault-tolerant algorithms and applications
FDs are found in many systems: e.g., ISIS, Ensemble,
Relacs, Transis, Air Traffic Control Systems, etc.
3
Failure Detectors
However:
• Hints may be incorrect
• FD may give different hints to different processes
• FD may change its mind (over & over) about the operational
status of a process
An FD is a distributed oracle that provides hints about the operational status of processes.
4
p
q
rs
t
q
q
q
q
s
s
SLOW
5
Talk Outline
• Using FDs to solve consensus
• Broadening the use of FDs
• Putting theory into practice
• Using FDs to solve consensus
• Broadening the use of FDs
• Putting theory into practice
6
p
q
rs
t
5
7
82
8
Consensus
5
55
5
Crash!
7
Consensus
• Equivalent to Atomic Broadcast
• Can be used to solve Atomic Commitment
• Can be used to solve Group Membership
• ….
A paradigm for reaching agreement despite failures
8
Solving Consensus
• In synchronous systems: Possible
• In asynchronous systems: Impossible [FLP83]
even if:• at most one process may crash, and• all links are reliable
9
Why this difference?
• In synchronous systems: use timeouts to determine with certainty whether a process has crashed
• In asynchronous systems: cannot determine with certainty whether a process has crashed or not (it may be slow, or its messages are delayed)
=> Perfect failure detector
=> No failure detector
10
Solving Consensus with Failure Detectors
But there is a time after which:
• every process that crashes is suspected (completeness)
• some process does not crash is not suspected (accuracy)
Is perfect failure detection necessary for consensus? No
is the weakest FD to solve consensus [CHT92]S
can be used to solve consensus [CT 91]S
SFailure detector Initially, it can output arbitrary information.
11
p
q
rs
t
D
D
DD
D
If FD D can be used
to solve consensus…
S
S
S
SS
then D can be
transformed into S
12
Work for up to f < n/2 crashes 12
3
4
• Processes are numbered 1, 2, …, n
• They execute asynchronous rounds
• In round r , the coordinator is
process (r mod n) + 1
Solving Consensus using : Rotating Coordinator AlgorithmsS
• In round r , the coordinator:
- tries to impose its estimate as the consensus value
- succeeds if does not crash and it is not suspected by S
13
A Consensus algorithm using (Mostefaoui and Raynal 1999)
• the coordinator c of round r sends its estimate v to all
• every p waits until (a) it receives v from c
or (b) it suspects c (according to <>S )
– if (a) then send v to all
– if (b) then send ? to all
• every p waits until it receives a msg v or ? from n-f processes
– if it received at least (n+1)/2 msgs v then decide v
– if it received at least one msg v then estimate := v
– if it received only ? msgs then do nothing
every process p sets estimate to its initial value
for rounds r := 0, 1, 2 ... do {round r msgs are tagged with r}
S
14
Why does it work?
n=7
p decides v every q changes its estimate to v
f=3
Agreement:
15
Why does it work?
Termination:
– With <>S no process blocks forever waiting for a message from a dead coordinator
– With <>S eventually some process c is not falsely suspected. When c becomes the coordinator, every process receives its c’s estimate and decides
16
Consensus 1 Consensus 2 Consensus 3
What Happens if the Failure Detector Misbehaves?
Consensus algorithm is:
• Safe -- Always!
• Live -- During “good” FD periods
17
Failure Detector Abstraction
• Increases the modularity and portability of algorithms
• Encapsulates various models of partially synchrony
• Suggests why consensus is not so difficult in practice
• Determines minimal info about failures to solve consensus
Some advantages:
18
Failure Detection Abstraction
By 1992, applicability was limited:
• Model: FLP only
– process crashes only
– a crash is permanent (no recovery possible)
– no link failures (no msg losses)
• Problems solved: consensus, atomic broadcast only
19
Talk Outline
• Using FDs to solve consensus
• Broadening the use of FDs
• Putting theory into practice
20
Broadening the Applicability of FDs
• Crashes + Link failures (fair links)• Network partitioning• Crash/Recovery• Byzantine (arbitrary) failures• FDs + Randomization
Other models:
Other problems:
• Atomic Commitment• Group Membership• Leader Election• k-set Agreement• Reliable Communication
21
Talk Outline
• Using FDs to solve consensus
• Broadening the use of FDs
• Putting theory into practice
22
Putting Theory into Practice
• ``Eventual’’ guarantees are not sufficient:
FDs with QoS guarantees
In practice:
• FD implementation needs to be message-efficient:
FDs with linear msg complexity (ring, hierarchical, gossip)
• Implementations need to be message-efficient:
FDs with linear msg complexity (ring, hierarchical, gossip)
• Failure detection should be easily available
Shared FD service (with QoS guarantees)
23
On Failure Detectors withQoS guarantees
[Chen, Toueg, Aguilera. DSN 2000]
24
q monitors p
pqHeartbeats
Heartbeats can be lost or delayed
Simple FD problem
• pL : probability of heartbeat loss
• D : heartbeat delay (random variable)
Probabilistic Model
25
Typical FD Behavior
downProcess p
up
FD at qtrust
suspect
trust
suspect(permanently)
trust
suspect
26
QoS of Failure Detectors
The QoS specification of an FD quantifies:
• how fast it detects actual crashes
• how well it avoids mistakes (i.e., false detections)
What QoS metrics should we use?
27
Detection Time
• TD : time to detect a crash
downProcess pup
FD trust
TD
Permanentsuspicion
28
Accuracy Metrics
• TMR : Time between two consecutive mistakes
• TM : Duration of a mistake
FD
TM
TMR
Process p up
29
Another Accuracy Metric Application (queries at random time)
Process pup
FD
T T T TS
• PA : probability that the FD is correct at a random time
)()(1 MRTEMTEAP
30
A Common FD Algorithm
31
A Common FD Algorithm
• Timing-out also depends on previous heartbeat
Process p
Process q
FD at q
TO TOTOTO
32
Large Detection Time
Process p
Process qTO TO TO
FD at q
crash
TD
TODTD )max(
• TD depends on the delay of the last heartbeat sent by p
33
A New FD Algorithm and its QoS
34
Process q
FD at q
Process phi-1 hi hi+1 hi+2
i i+1 i+2Freshness points: i-1
• At time t[i,i+1), q trusts p iff it has received heartbeat hi or higher.
New FD Algorithm
35
Detection Time is Bounded
Process p
Process q
crash
FD at q
TD
DT
hi
i i+1
36
Optimality Result
Among all FD algorithms with the same heartbeat rate and detection time, this FD has the best query accuracy probability PA
37
QoS AnalysisGiven:
• the system behavior pL and Pr(D t)
• the parameters and of the FD algorithm
Can compute the QoS of this FD algorithm:
• Max detection time TD
• Average time between mistakes E (TMR )
• Average duration of a mistake E (TM )
• Query accuracy probability PA
38
QoS Analysis
SMR pTE )(
0)(
11 dxxuPA
)0()Pr()1( uDpp LS
0
)Pr()1()(j LL jxDppxuwhere
Given:
• the system behavior pL and Pr(D t)
• the parameters and of the FD algorithm
Can compute the QoS of this FD algorithm:
DT
0)(
1)( dxxu
pTE
SM
39
Satisfying QoS Requirements
• Given a set of QoS requirements:
UMT
LMRT
mistake a ofduration Average
mistakesbetween timeAverage
UDT timeDetection
• Compute and to achieve these requirements
40
Computing FD parameters to achieve the QoS
• Assume pL and Pr(D x) are known
• Problem to be solved:
UM
S
LMR
S
UD
Tp
dxxu
Tp
T
0)(
subject tomaximize
41
Configuration Procedure
• Step 1: compute
and let
)Pr()1(0UDL TDpq
UMTq0max
• Step 2: let
find the largest max that satisfies
1
10 )]Pr()1([)(
UDT
j
UDLL jTDppqf
LMRTf )(
• Step 3: set UDT
42
Failure Detector
Configurator
PL
QoSRequirements
T , T , TDU
MRL
MU
P(D x)
Probabilistic Behavior of Heartbeats
43
Example
• Probability of heartbeat loss: pL = 0.01
• Heartbeat delay D is exponentially distributed
with average delay E(D) = 0.02 sec
QoS requirements:
• Detect crash within 30 sec
• At most one mistake per month (average)
• Mistake is corrected within 60 s (average)
• Send a heartbeat every = 9.97 sec
• Set shift to = 20.03 sec
Algorithm parameters:
44
If System Behavior is Not Known
If pL and Pr(D x) are not known:
• use E(D), V(D) instead of Pr(D x) in the configuration procedure
• estimate pL, E(D), V(D) using heartbeats
45
Failure Detector
Configurator
Estimator of the Probabilistic
Behavior of Heartbeats
PL
QoSRequirements
T , T , TDU
MRL
MU
V(D)E(D)
46
Example
• Probability of heartbeat loss: pL = 0.01
• Distribution of heartbeat delay D is not known,
but E(D) = V(D) = 0.02 sec are known
QoS requirements:
• Detect crash within 30 sec
• At most one mistake per month (average)
• Mistake is corrected within 60 s (average)
• Send a heartbeat every = 9.71 sec
• Set shift to = 20.29 sec
Algorithm parameters:
47
A Failure Detector Servicewith
QoS guarantees
[Deianov and Toueg. DSN 2000]
48
Approaches to Failure Detection
• currently:– each application implements its own FD
– no systematic way of setting timeouts and sending rates
• we propose FD as shared service:– continuously running on every host
– can detect process and host crashes
– provides failure information to all applications
49
Advantages of Shared FD Service
• sharing:– applications can concurrently use the same FD service
– merging FD messages can decreases network traffic
• modularity:– well-defined API
– different FD implementations may be used in different environments
• reduced implementation effort :-)– programming fault-tolerant applications becomes easier
50
Advantages of Shared FD Service with QoS
• QoS guarantees:– applications can specify desired QoS
– applications do not need to set operational FD parameters (e.g. timeouts and sending rates)
• adaptivity:– adapts to changing network conditions (message delays
and losses)
– adapts to changing QoS requirements
51
Prototype Implementation
named pipes
application process
shared library
function calls
FD module
UDP messages
Ethernet network
UNIX host
52
Summary• Failure detection is a core component of fault-tolerant systems.
• Systematic study of FDs started in [CT90,CHT91] with:– Their specification in terms of properties– Their comparison by algorithmic reduction
• Initial focus: FLP model (crash only, reliable links) and Consensus
• Later research: Broadening the applicability of FDs other models (e.g., crash/recovery, lossy links, network partitions) other problems (e.g., group membership, leader election, atomic commit)
• Current effort: putting theory closer to practice– More efficient algorithms for FDs and FD-based consensus algos– FD algorithms with QoS guarantees in a probabilistic network– shared FD service with QoS guarantees