Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
A Closer Look at Fault Tolerance
Gadi Taubenfeld SRDC 2013 1
Gadi Taubenfeld
Example: Perfect Renaming
Gadi Taubenfeld SRDC 2013 2
17 39 11 99 27
5 3 1 2 4
Example: Perfect Renaming
Gadi Taubenfeld SRDC 2013 3
5
39
1 2 4
1-resilient
Example: Perfect Renaming
Gadi Taubenfeld SRDC 2013 4
5
39
1 2
27
Not 1-resilient
Example: Perfect Renaming
Gadi Taubenfeld SRDC 2013 5
5
39
1 2
27 39 27 17 11 99
Not 1-resilient Not 1-resilient
A General Definition See paper for details
Gadi Taubenfeld SRDC 2013 6
For a given function f: N N, an algorithm is (t,f)-resilient if in the
presence of t’ faults at most f(t’) participating correct processes may
not terminate their operations, for 0 ≤ t’ ≤ t.
Not covered in this talk
Notation
Gadi Taubenfeld SRDC 2013 7
Correct active process
Correct process that has terminated
Faulty process
Wait-freedom [Herlihy 1991]
Gadi Taubenfeld SRDC 2013 8
In the presence of any number of faults, all the correct participating processes must terminate.
1 faults
2 faults
3 faults
4 faults
0 faults P1 P2 P3 P4 P5 P6
5 faults
Almost-wait-freedom
Gadi Taubenfeld SRDC 2013 9
In the presence of any number of faults, all the correct participating processes, except maybe one, must terminate.
1 faults
2 faults
3 faults
4 faults
0 faults P1 P2 P3 P4 P5 P6
5 faults
Partially-wait-freedom
Gadi Taubenfeld SRDC 2013 10
In the presence of any number of t ≤ n-1 faults all the correct participating processes, except maybe t of them, must terminate.
1 faults
2 faults
3 faults
4 faults
0 faults P1 P2 P3 P4 P5 P6
5 faults
Weakly-wait-freedom
Gadi Taubenfeld SRDC 2013 11
In the presence of any number of faults, if there are two or more correct participating processes then one correct participating processes must terminate.
1 faults
2 faults
3 faults
4 faults
0 faults P1 P2 P3 P4 P5 P6
5 faults
Technical Results
Gadi Taubenfeld SRDC 2013 12
Problem Model Weakly WF
PartiallyWF
Almost WF
Complexity
Election SM/MP
Test&set SM
Perfect renaming
SM/MP
Stack SM
Swap SM
Fetch&add SM
Consensus Set-consensus
SM/MP
SM -- Shared Memory using atomic registers
MP – Message Passing (send/receive)
Thm: There is no 1-resilient implementation
using atomic registers or messages
Technical Results
Gadi Taubenfeld SRDC 2013 13
Problem Model Weakly WF
PartiallyWF
Almost WF
Complexity
Election SM/MP
Test&set SM
Perfect renaming
SM/MP
Stack SM
Swap SM
Fetch&add SM
Consensus Set-consensus
SM/MP
x x x
SM -- Shared Memory using atomic registers
MP – Message Passing (send/receive)
Gadi Taubenfeld SRDC 2013 14
Problem Model Weakly WF
PartiallyWF
Almost WF
Complexity Upper Lower
Election SM log n +2 log n +1
Election MP O(n^2)
Test&set SM n+1 n
Perfect renaming one-shot
SM O(n log n)
Perfect renaming one-shot
MP O(n^3)
Perfect renaming Long-lived
SM O(n^2)
Technical Results
SM -- # of atomic registers
MP -- # of messages
Gadi Taubenfeld SRDC 2013 15
An almost-wait-free symmetric election process p program
turn = p for level = 1 to log n do repeat if done = 1 then return(0) fi if turn p then for j =1 to level - 1 do if V[j] = p then V[j] = 0 fi od return(0) fi until V[level] = 0 V[level] = p if turn p then for j =1 to level do if V[j] = p then V[j] = 0 fi od return(0) od done = 1; return(1)
0 turn 0 0 0 0 0 0 0 0 0 done V
1 log n . . .
p
Inspired by Styer & Peterson PODC 1989
16
An almost-wait-free symmetric test&set bit process p program
if turn 0 then return(0) fi turn = p repeat for j =1 to n-1 do if lock[j] = 0 then lock[j] = p fi od locked = 1 for j =1 to n-1 do if lock[j] p then locked = 0 fi od until turn p or locked = 1 or winner = 1 if turn p or winner = 1 then for j =1 to n-1 do if lock[j] = p then lock[j] = 0 fi od return(0) fi winner = 1; return(1)
test&set
winner = 0; turn = 0 for j =1 to n-1 do if lock[j] = p then lock[j] = 0 fi od
reset
0 turn
0
0 0 0 0 0 0 0 0
locked
lock
1 n-1 . . .
p
0 winner
(local)
Gadi Taubenfeld SRDC 2013 17
A trivial almost-wait-free symmetric election Program for a process with identifier my.id
counter := 0
Send my.id to all the other processes;
Each time a message is received do
if my.id < message.val then return(0) else counter := counter +1 fi
if counter =n-1 then return(1) fi
od
Is there a better algorithm ?
Gadi Taubenfeld SRDC 2013 18
Perfect Renaming Partially-wait-free, Long-lived
0 0 0 0 0
Almost-wait-free test&set bit
What about almost-wait-free renaming ?
1 2 3 4 5
Gadi Taubenfeld SRDC 2013 19
Fetch&add, Swap, Stack Partially-wait-free
Fetch&add
Swap
Stack
Test&set
+ atomic registers
WF WF
WF
WF
Almost-WF
Partially-WF
Partially-WF
Partially-WF
[Afek, Weisberger, Weisman PODC 93]
What about almost-wait-free ?
Open Problems
Improve our results: – Computability: Is there an almost-wait-free perfect renaming, stack, swap, f&a, … – Complexity: improve the space/message/time …
Type of faults: crash, omission, Byzantine, …
Time: asynchronous, synchronous, … Other objects: queue, …
Failure models: uniform, non-uniform
Other models: unbounded concurrency, failure detectors …
Gadi Taubenfeld SRDC 2013 20
Fault- tolerance What can go wrong?
Processes
Communication links
Messages
Shared memory
Timing failures
Gadi Taubenfeld SRDC 2013 21
Memory reordering
Example: Using Flags
Gadi Taubenfeld 22 ICDCN 2013
x and y : atomic bits, initially 0
Q: Is it possible that both processes read the value 0 ?
0 x 0 y
Process A
write.x(1)
read.y
Process B
write.y(1)
read.x
Gadi Taubenfeld 23 ICDCN 2013
Example: Using Flags x and y : atomic bits, initially 0
0 x 0 y
Process A
write.x(1)
read.y
Process B
write.y(1)
read.x
Fact: Many hardware architectures do not support sequential consistency because thy think it is too strong
Gadi Taubenfeld 24 ICDCN 2013
Example: Using Flags x and y : atomic bits, initially 0
0 x 0 y
Process A
write.x(1)
read.y
Process B
write.y(1)
read.x
Solution: Memory barriers
Gadi Taubenfeld 25 ICDCN 2013
Assumption: At most one memory reordering is possible.
Example: Using Flags x and y : atomic bits, initially 0
0 x 0 y
Process A
write.x(1)
read.y
Process B
write.y(1)
read.x
Gadi Taubenfeld 26 ICDCN 2013
Assumption: At most one memory reordering is possible.
Process A
write.x(1)
write.x(1)
read.y
Process B
write.y(1)
write.y(1)
read.x
Example: Using Flags
Question
Gadi Taubenfeld 27 ICDCN 2013
How can we provide some level of resiliency against
memory reordering and reduce the number of memory
barriers required.
Gadi Taubenfeld 28
X: atomic register
write.x(1)
write.x(2)
write.x(3)
read.x
ICDCN 2013
No reordering: x {3}
One reordering: x {2,3}
Two reordering: x {1,2,3}
X: 2-atomic register
write.x(1)
write.x(2)
write.x(3)
read.x
No reordering: x {2,3}
One reordering: x {1,2,3}
1. Design your algorithm to be correct assuming weak objects (2-atomic registers)
2. Replace the weak objects with strong objects (1-atomic registers)
Get “some” resiliency against memory
reordering (I.e., need less barriers)
Memory reordering resiliency: design strategy
Gadi Taubenfeld 29 ICDCN 2013
1. Design your algorithm to be correct assuming weak objects (2-atomic registers)
2. Replace the weak objects with strong objects (1-atomic registers)
Get “some” resiliency against memory
reordering (I.e., need less barriers)
Memory reordering resiliency: design strategy
What weak objects ?
How much ?
The End
Gadi Taubenfeld SRDC 2013 30