Chapter 3 - Motivating Self-Stabilization 3-1
Chapter 3Self-Stabilization
Self-StabilizationShlomi DolevMIT Press , 2000
Shlomi Dolev, All Rights Reserved ©
Chapter 3 - Motivating Self-Stabilization 3-2
Chapter 3: Motivating Self-Stabilization
Converging to a desired behavior from any initial state enables the algorithm to converge from an arbitrary state caused by faults
Why should one have interest in self-stabilizing algorithms? Its applicability to distributed systems Recovering from faults of a space shuttle.
Faults may cause malfunction for a while. Using a self-stabilizing algorithm for its control will cause an automatically recovery, and enables the shuttle continue in its task
Chapter 3 - Motivating Self-Stabilization 3-3
What is a Self-Stabilizing Algorithm ?
This question will be answered using the “Stabilizing Orchestra” example
The Problem: The conductor is unable to participate –
harmony is achieved by players listening to their neighbor players
Windy evening – the wind can turn some pages in the score, and the players may not notice the change
Chapter 3 - Motivating Self-Stabilization 3-4
The “Stabilizing Orchestra” Example Our Goal:
To guarantee that harmony is achieved at some point following the last undesired page turn
Imagine that the drummer notices a different page of the violin next to him … (solutions and their problems):1. The drummer turns to its neighbors new
page – what if the violin player noticed the difference as well ?
2. Both the drummer and violin player start from the beginning- what if the player next to the violin player notices the change only after sync between the other 2 ?
Chapter 3 - Motivating Self-Stabilization 3-5
The “Stabilizing Orchestra” Example – the Self-Stabilizing Solution
Every player will join the neighboring player who is playing the earliest page (including himself)
Note that the score has a bounded length. What happens if a player goes to the first page of the score before harmony is achieved? This case is discussed in details in chapter 6.
In every long enough period in which the wind does not turn a page, the orchestra resumes playing in synchrony
Chapter 3 - Motivating Self-Stabilization 3-6
Chapter 3: roadmap
3.1 Initialization of a Data-Link Algorithm in the Presence of Faults
3.2 Arbitrary Configuration Because of Crashes
3.3 Frequently Asked Questions
Chapter 3 - Motivating Self-Stabilization 3-7
The Data Link Algorithm
The task of delivering a message is sophisticated, and may cause message corruption or even loss
Physical Layer
Data link Layer
TailPacketFrame
Network Layer
Head
Physical Layer
Data link Layer
TailPacketFrame
Network Layer
Head
The layers involved:
Physical Layer
Data link Layer
TailPacketFrame
Network Layer
Head
Physical Layer
Data link Layer
TailPacketFrame
Network Layer
Head
The layers involved:
TailPacket
Frame
Head
The sender sends sequences of bits to the receiver
Chapter 3 - Motivating Self-Stabilization 3-8
The alternating-bit algorithmIs used to cope with possibility of frame corruption or loss
01 initialization02 begin03 i := 104 bits := 005 send(bits,imi) (*imi is fetched*)06 end (*end initialization*)07 upon a timeout08 send(bits,imi) 09 upon frame arrival10 begin11 receive(FrameBit)12 if FrameBit = bits then13 begin14 bits := (bits + 1) mod 215 i := i + 116 end17 send(bits,imi) (*imi is fetched*)18 end
Sender01 initialization02 begin03 j := 104 bitr := 105 end (*end initialization*)06 upon frame arrival07 begin08 receive(FrameBit , msg)09 if FrameBit bitr then10 begin11 bitr := FrameBit 12 j := j + 113 omj := msg14 end15 send(bitr) 16 end
Receiver
Every message from the sender is repeatedly sent in a frame to the receiver until acknowledges arrives
acknowledgement
Send acknowledgement
Chapter 3 - Motivating Self-Stabilization 3-9
The alternating-bit algorithm – run sample
S R
<m1 ,0>
bits = 0 bitR = 1
S R
<m1 ,0> . . . . . . . <m1 ,0>
bits = 0 bitR = 1
Upon a timeout …
S R
. . . . <m1 ,0> . . . .
<0>bits = 0 bitR = 0
R received m1
Upon a timeout …
S R
<m1 ,0> . . . . . . . <m1 ,0>
<0>bits = 0 bitR = 0
S R
<m2 ,1> . . . <m1 ,0> . . . .
<0>bits = 1 bitR = 0
S received ack.R received m1 again
S R
. . . . <m2 ,1> . . .
<0> . . . . . . <0>bits = 0 bitR = 0
R received m1 again S R
<m2 ,1> . . . . . . . <m2 ,1>
<0>bits = 1 bitR = 0
Upon a timeout …S R
. . . . <m2 ,1> . . . .
<0>. . . . . . . . <1>bits = 1 bitR = 1
R received m2
S R
<m2 ,1> . . . . . . <m2 ,1>
<1> . . . . . . . . bits = 1 bitR = 1
Once the sender receives an acknowledgment <1>, no frame with sequence number 0 exists in
the system
Chapter 3 - Motivating Self-Stabilization 3-10
There Is No Data-link Algorithm that can Tolerate Crashes
It is usually assumed that a crash causes the sender/receiver to reach an initial state
No initialization procedure exists such that we can guarantee that every message fetched by the sender, following the last crash, will arrive at its destination
The next Execution will demonstrate this point. Denote: CrashR – receiver crash
CrashS – sender crash
CrashX causes X to perform an initialization procedure
Chapter 3 - Motivating Self-Stabilization 3-11
The Pumping Technique
The idea : repeatedly crash the sender and the receiver and to replay parts of the RE in order
to construct a new execution E’
Reference Execution (RE) = CrashS, CrashR, sendS(fs1), receiveR(fs1), sendR(fr1), receiveS(fr1),
sendS(fs2), … , receiveS(frk)
S R
fs1
S sends fs1
S R
fr1
R receives fs1 and sends fr1
S R
CrashS
S crashes
fr1
S R
fs2 fs1
S sends fs1 receives fr1 and sends fs2
S R
fs2 fs1 CrashR
R crashes
S R
fr1 fr2
R receives fs1, sends fr1, receives fs2 and sends fr2
S R
fr1 fr2 ... frk
R receives fs1, sends fr1, receives fs2, sends fr2 , … , receives fsk and sends frk
S R
fr1 fr2 ... frk
CrashR
Now S and R crashCrashS
fsk ... fs2 fs1
We let S send fsi and receive fri (i from 1 to k)
S R
If these k frames are lost, no information about the message exists in the system
S RS R
CrashS
Suppose CrashS and CrashR occurred
CrashR
S R
CrashS
fr1 fr2 ... fr(k-1)
S crashes
.
.m2
m1
S R
fsk ... fs2 fs1
S sends fs1 receives fr1 , sends fs2 receives fr2, … , receives fr(k-1) and sends fsk
m2m1m2
.....Continue with the same technique
Chapter 3 - Motivating Self-Stabilization 3-12
Conclusion !
It is possible to show that there is no guarantee that the kth message will be received
We want to require that eventually every message fetched by the sender reaches the receiver, thus requiring a Self-Stabilizing Data-Link Algorithm
Chapter 3 - Motivating Self-Stabilization 3-13
Chapter 3: roadmap
3.1 Initialization of a Data-Link Algorithm in the Presence of Faults
3.2 Arbitrary Configuration Because of Crashes
3.3 Frequently Asked Questions
Chapter 3 - Motivating Self-Stabilization 3-14
Arbitrary configuration because of crashes
A combination of crashes and frame losses can bring a system to any arbitrary states of processors and an arbitrary configuration
Chapter 3 - Motivating Self-Stabilization 3-15
Any Configuration Can be Reached by a Sequence of Crashes
The pumping technique is used to reach any arbitrary configuration starting with the reference executionReference Execution (RE) = CrashS, CrashR, sendS(fs1),
receiveR(fs1), sendR(fr1), receiveS(fr1), sendS(fs2), … , receiveS(frk)
The technique is used to accumulate a long sequence of frames
Chapter 3 - Motivating Self-Stabilization 3-16
Reaching an Arbitrary Configuration
Our first goal – creating an execution in which RE appears i times in a row (RE)i
S R
fr1 fr2 ... frk
First we use the Pumping Technique to receive RE
S R
fs1
fr1 fr2 ... frk
S sends fs1
S R
CrashSfs1
fr1 fr2 ... frk
S crashes
S R
fsk, … , fs2, fs1, fs1,
S sends fs1, receives fr1, sends fs2, receives fr2, … , sends fsk, receives frk,
S R
FsE
fr1
R receives fs1 and sends fr1
Denote : FrE (FsE) – the sequence of frames sent by the receiver (sender) in RE
S R
FsE
fr1
CrashR
R crashes
S R
fr1 FrE
R receives fs1 sends fr1 … receives fsk and sends frk
S R
CrashS
fr1 FrE
S crashes
S R
fs2 fs1
FrE
S sends fs1, receives fr1, sends fs2
S R
CrashSfs2 fs1
FrE
S crashes
S R
FsE fs2 fs1
S sends fs1, receives fr1, … , sends fsk, receives frk
S R
FsE FsE
S received the first FrE, crashed and received the second
.....Continue with the same technique
FirE (Fi
sE) = the sequence Fr(s)E Fr(s)E … Fr(s)E (i times)
S R
F isE
For any finite i, the technique can be extended to reach a configuration in which Fi
sE appears in qs,r
Chapter 3 - Motivating Self-Stabilization 3-17
Reaching an Arbitrary Configuration
Our second goal – achieving ca (an arbitrary configuration) Denote k1 (k2)- the number of frames in qs,r (qr,s) in ca i = k1+k2+2
S R
F isE
Using the previous technique we accumulate Fi
sE
S R
F k1+1
sE
F k2+1
rE
R replays RE k2+1 times
S' R
F k1+1
sE
qr,s
S replays RE using the first FrE until it reaches its desired state (loosing the frames sent by it
and the leftovers of Fk2rE that are not in qr,s)
S' R'
qs,r
qr,s
We do the same with R, reaching the arbitrary configuration ca
Chapter 3 - Motivating Self-Stabilization 3-18
Crash-Resilient Data-Link Algorithm,With a Bound on the Number of Frames in
Transit Crashes are not considered severe type of
faults (Byzantine are more severe - chapter 6) The algorithm uses the initialization procedure,
following the crashes of S and R bound – the maximal number of frames that
can be in transit
S R
<clean ,1>S ,in after-crash state, invokes a clean procedure
S R
CrashS
S crashes
S R
<clean ,1>. . . . <clean ,1>
S R
. . . <clean ,1> . . .
<ackClean ,1>
S R
<clean ,1>. . . <clean ,1>
<ackClean ,1>
S R
<clean ,2>. . .<clean ,1>
<ackClean ,1>
S received <ackClean,1>, then sends repeatedly <clean,2> until it will receive
<ackClean,2>.....Continue until S receives <ackClean,bound+1>
S R
. . .<clean ,bound+1>. . .
<ackClean ,bound+1>
When the sender receives the first <ackClean,bound+1> it can be sure that the only label in transit is bound+1, and can
initialize the alternating bit algorithm (similarly R can initialize as well)
S R
<mnew ,0>
<ackClean ,1>bits = 0 bitR = 1
Chapter 3 - Motivating Self-Stabilization 3-19
Crash-Resilient Data-Link Algorithm – R crashes
S R
CrashR
R crashes
S R
<msg ,FrameBit>
bitR = i
S R
<msg ,FrameBit>
bitR =FrameBit
R received msg and assigned FrameBit to bitR it then delivers msg to the output queue – The Problem : extra copy of msg in the output queue
Chapter 3 - Motivating Self-Stabilization 3-20
Crash-Resilient Data-Link Algorithm – R crashes
Can we guarantee at most one delivery, and exactly-once delivery after the last crash?
bitR initialization should assure that a message fetched after the crash will be delivered
A solution: S sends each message in a frame with label
0, until Ack. arrives and then sends the same message with label 1 until an Ack. arrives
R delivers a message only with label 1 that arrives immediately after label 0
Chapter 3 - Motivating Self-Stabilization 3-21
Chapter 3: roadmap
3.1 Initialization of a Data-Link Algorithm in the Presence of Faults
3.2 Arbitrary Configuration Because of Crashes
3.3 Frequently Asked Questions
Chapter 3 - Motivating Self-Stabilization 3-22
What is the Rational behind assuming that the states of the processors can be
corrupted while the processors’ programs cannot ?
The program is stored in a long-term memory device which makes it possible to1. Reload program statements periodically2. Protect the memory segment using a read-
only memory device
If the program is subjected to corruption, any configuration is possible. The Byzantine model allows 1/3 of processors to execute corrupted
programs
Chapter 3 - Motivating Self-Stabilization 3-23
Safety Properties Safety and Liveness properties should be
satisfied by a distributed algorithm Safety ensures avoiding bad configurations Liveness ensures achieving the systems’ goal
The designer of a self-stabilizing algorithm wants to ensure that even if the safety property is violated, the system execution will reach a suffix in which both properties hold
What use is an algorithm that doesn’t ensure that a car never crashes? If the faults are severe enough to make the
algorithm reach an arbitrary configuration, the car may crash no matter what the algorithm is chosen
Chapter 3 - Motivating Self-Stabilization 3-24
Safety Properties … A safety property for a car controller might be:
never turn into a one-way road
When no specification exists the car can continue driving on this road and crash with other carsA self Stabilization controller will recover from this
non-legal init (by turning the car)
Chapter 3 - Motivating Self-Stabilization 3-25
Processors Can Never be Sure that a Safe Configuration is Reached
What use is an algorithm in which the processors are never sure about the current global state?
The question confuses the assumptions (transient faults occurrence) with the algorithm that is designed to fit the severe assumptions.A self-stabilizing algorithm can be designed to start in a particular (safe) state
A self-stabilizing algorithm is at least good as a non-self-stabilizing one for the same task, and is in fact much better !!!