LSRP: Local Stabilization in Shortest Path Routing
Anish Arora Hongwei Zhang
Motivations
Local fault containment is important in large-scale systems Stability, availability, and scalability
Self-stabilization is desirable in the presence of unanticipated faults
Even simple faults (such as node crash and message loss ) can drive a network protocol into arbitrary states
Local containment and local self-stabilization in routing remain unsolved
Only consider D-V routing RIP, BGP (path-vector), DSDV, AODV …
Outline Network and fault model
Definitions & problem statement
LSRP design & analysis
Related work
Summary
Network model
A network is a connected graph G=(V, E)
Each node has a unique ID
There is a clock at each node, with a single constraint
“the ratio of clock rates between any two neighboring
nodes is bounded from above by (not caring about the
absolute value)”
Fault model
Fail-stop: node and link
Join: node and link
State corruption
Outline Network and fault model
Definitions & problem statement
LSRP design & analysis
Related work
Summary
Definitions
Perturbations size
Range of contamination
F-local stabilizing
problem specific &
algorithm independent
algorithm dependent
Perturbation size: definition
Problem-specific variables E.g., “next-hop” in routing
Perturbation size at a network state q, denoted as P(q), is
the minimum number of up nodes where some transient faults have occurred or the values of whose problem-specific variables have to be
changed in order for the network to stabilize to a legitimate state
It characterizes the minimum amount of work needed in
order for a network to stabilize
0
Perturbation size: examples
Perturbation size: 0Perturbation size: 1Perturbation size: 31
4
5
5
2
10
4
4
33
4
2
3
83
7 6
1
5
9
11
12
11
2
Range of contamination When a network self-stabilizes to a legitimate state
q’ from an arbitrary state q, the range of contamination during stabilization is
the maximum distance from any node, that has changed state at least once during stabilization but whose state is the same at q’ and q, to the set of nodes that change state from q’ to q
G
Rc
F -local stabilizing
A network is F-local stabilizing if
starting at an arbitrary state q, the network self-stabilizes
to a legitimate state within F(P(q)) time, where F is a
function and P(q) is the perturbation size at state q.
“ A network is F-local stabilizing” implies that
the range of contamination during stabilization is
O(F(P(q))).
Problem statement: local stabilization in shortest path routing
Design a protocol that, given a network G(V, E) and a
destination node r, constructs and maintains a spanning
tree T (called shortest path tree) of G such that
r is the root of T
for every node i V, the path from i to r in T is a
shortest path between i and r in G
the network is F-local stabilizing
Outline Network and fault model
Definitions & problem statement
LSRP design & analysis
Related work
Summary
Fault propagation in existing D-V protocols
2
10
4
4
33
4
2
3
83
7 6
1
5
9
12
12
11
2
5
4
22
1
3 33
4
0
LSRP design
The cause for fault propagation:
“correction” action always lags behind “fault propagation” action
Solution:
the “source of fault propagation (such as node 8)” detects
the fault propagation, and initiates a “containment” action
that catches up with and stops the “fault propagation”
action
avoid forming cycles during stabilization, and remove
existing cycles fast
Approach: layering of diffusing waves Use three diffusing waves such that
Each diffusing wave has different propagation speed Speed is controlled by introducing delay in action execution
A mistakenly initiated layer-i wave Wi is contained and prevented
from propagating unbounded by a layer-(i+1) wave that is initiated at the same node which has initiated Wi
The top-layer wave self-stabilizes itself locally upon perturbations
Specifically,
V2
V1
Super-containment Wave
Stabilization Wave
Containment Wave
V0
V1 > V0
V2 > V1 > V0
Stabilization wave
Implements the basic distributed Bellman-Ford algorithm, with slight changes to interact with containment wave (no interaction with super-containment wave)
Variables: (p.i, d.i) for each node i
Actions:<S1>:: ( i is the dest. node i initiated a cont. wave) p.i ≠ i p.i := i
[]
<S2>:: i prop. SW from j j is not in CW d.i, p.i := d.j+1, j
ghost.i := false
Can be mistakenly initiated and cause fault propagation thus calls for containment wave
···
Stabilization Wave
···
V0
],[ UdLd ss
Containment wave Prevents a mistakenly initiated stabilization wave from
propagating faults unbounded
Additional variable: ghost.i for each node i
Actions:<C1>:: ghost.i (i is a source of fault prop. i prop. CW from p.i)
ghost.i := true; if i is a source of fault prop. p.i := i
fi[]<C2>:: ghost.i no other node using the corrupted state of i ghost.i := false; set (d.i, p.i)
Catch up with and stop corresponding stabilization wave
Can be mistakenly initiated thus call for super-containment wave
],[ UdLd cc
V1
···
Stabilization Wave
Containment Wave
V0
Super-containment wave Prevents a mistakenly initiated containment wave from
propagating unbounded
No additional variables needed (stateless)
Action<SC> :: ghost.i (i is not a source of fault prop. p.i is not in CW)
ghost.i := false
Catch up with and stop corresponding containment wave
Self stabilizes locally stateless: trivial stabilization (no action needed) no unbounded propagation: constrained by the range of
containment wave (which is a function of perturbation size)
V2
V1
Super-containment Wave
Stabilization Wave
Containment Wave
V0
],[ UdLd scsc
Example revisited
3
2
10
4
4
3
4
2
3
83
7 6
1
5
9
12
12
11
1
5
4
C1 enabled at node 8
S2 enabled at nodes 6 and 5C1 executed at node 8 first, which disables S2 at nodes 6 and 5C2 executed at node 8, and network self-stabilizes
0
2
Protocol analysis LSRP is F-local stabilizing, where F is a linear function:
starting at an arbitrary state q0, a network reaches a state where the shortest path tree is
formed within O(P(q0)) time the range of contamination is O(MAXP), where MAXP denotes
the number of nodes in the largest perturbed region at q0 and
is no greater than P(q0).
perturbed regions that are far away from one another (i.e. half-distance is w(MAXP)) self-stabilizes in parallel
Quick loop removal:existing loops are removed within a small constant (i.e.,dsc+U)
time
Loop freedom:no new loop is formed during stabilization
Outline Network and fault model
Definitions & problem statement
LSRP design & analysis
Related work
Summary
Related work
Ghosh, Gupta, Herman, and Pemmaraju (PODC ’96) [4] Algorithms for locally containing a single state-corruption during
stabilization of a shortest path tree Not deal with such cases of multiple faults and node or link fail-stop
Ghosh and He (WSS ’99) [5] Fault-containing self-stabilizing algorithm for a consensus problem Only considers the case of linear topology, and the range of
contamination can be exponential in the perturbation size
Zhang and Arora (PODC ‘02) [16] Local stabilizing algorithm for clustering and shortest path routing in
wireless sensor networks The approach is based on different model assumptions: dense node
distribution, and knowledge of geometric information
Outline Network and fault model
Definitions & problem statement
LSRP design & analysis
Related work
Summary
Conclusion
Formulated concepts of perturbation size, range of contamination, and F-local stabilization
Designed LSRP for linear-local stabilization in shortest path routing
quick loop removal and loop freedom are automatically guaranteed by local stabilization
Faults are regarded as state corruption, and dealt with by way of self-stabilization