INFOCOM 2002 1Aman Shaikh: June 02
UCSC
Avoiding Instability during Graceful Shutdown of OSPF
Aman Shaikh, UCSC
Joint work with
Rohit Dube, Xebeo Communications Inc.
Anujan Varma, UCSC
INFOCOM – June 2002
INFOCOM 2002 2Aman Shaikh: June 02
UCSC
Software Upgrade is a Pain• Upgrade of routing software on routers is a fact of
life– Extensions to routing protocols, new functionality,
version upgrades, bug fixes– Critical need for seamless upgrades
• Current practice– During upgrade, network operators withdraw “router-
under-upgrade” from forwarding service• Route flaps, traffic disruption, instability
– Operators have to carefully schedule upgrades• Schedule them during night when load is moderate• Stagger upgrades of different routers
– A painful job
INFOCOM 2002 3Aman Shaikh: June 02
UCSC
We Can do Better
• Router can continue forwarding even while its routing process is inactive, at least for a while– Current routers have separate routing and forwarding
paths• Routing in software (CPU), forwarding in hardware (switching)
• Routing protocols need to be extended since they always try to route around inactive router
• Our proposal: IBB (I’ll Be Back) Extension to OSPF
• Other proposals– OSPF: Hitless restart proposal by Jonh Moy
• Internet draft: draft-ietf-ospf-hitless-restart-02.txt
– BGP: Graceful restart proposal by Sangli et al.• Internet draft: draft-ietf-idr-restart-05.txt
INFOCOM 2002 4Aman Shaikh: June 02
UCSC
Router Model
Route Processor (CPU)
Forwarding Info. Base (FIB)
Interface card Interface card
Forwarding
SwitchingFabric
Data packet
Data packet
Topology view
Shortest Path Tree (SPT)
OSPF Process
LSA LSA
Forwarding
INFOCOM 2002 5Aman Shaikh: June 02
UCSC
IBB Proposal in a Nutshell
• OSPF process on router R needs to be shutdown• Before shutdown, R informs other routers that it is going to be inactive for a while• R specifies a time period (IBB Timeout) by which it expects to become operational again• Other routers continue using R for forwarding during IBB Timeout period• If R comes back within IBB Timeout period, no routing instability or flaps• Else other routers start forwarding packets around R
INFOCOM 2002 6Aman Shaikh: June 02
UCSC
• R cannot update its forwarding table to reflect the change– Can lead to loop or black holes
What if Topology Changes
B
A
R
3
2
6
(a) Topology when R went down
B
A
R
10
2
6
(b) Topology changes while R is inactive
INFOCOM 2002 7Aman Shaikh: June 02
UCSC
Handling Changes: Options
• Don’t do anything• Stop using R: Moy’s proposal
– Inadvertent changes during upgrade are likely• Flapping due to a bad interface somewhere
– But all changes are not bad• Do not always lead to loops or black holes
• Stop using R only when loop or black hole gets formed– And only for those destinations for which there is a
problem– Need algorithms which is what the bulk of the paper is
about
Our approach
INFOCOM 2002 8Aman Shaikh: June 02
UCSC
Roadmap of Algorithm
• Single area, single inactive router case
– Loop formation
– Black hole formation
• Single area, multiple inactive routers case
• Multiple areas
INFOCOM 2002 9Aman Shaikh: June 02
UCSC
Single Area, Single Inactive Router
• Problem Formulation– Inactive Router = R– All routers other than R have the same image
of the topology graph– R’s image is that of a past - the time at which it
went down– Source = S, Destination = D– Next hop(R, D) = Y– Actual path a packet takes from S to D =
P(S->D)
INFOCOM 2002 10Aman Shaikh: June 02
UCSC
Loop Detection
• P(S->D) has a loop iff S and Y have R on their paths to D in their SPTs (Shortest Path Trees)
D
R
3
2 6
Topology when R went down
S
1
Y
20
D
R
10
2 6
S
1
Y
Topology changes while R is inactive
20
Y
R
D
2
6
S and Y have R on their paths to D in their SPT
S
1
S
R
D
1
6
Y
2
If there is a loop, neighbor can always detect it
INFOCOM 2002 11Aman Shaikh: June 02
UCSC
Loop Prevention
• Every router needs to calculate a
path to D such that R does not appear on it
D
R
10
2 6
S1
Y
Changed topologywhile R is inactive
20
S
D
20
S and Y calculate pathsto D w/o R on it
Y
D
10
INFOCOM 2002 12Aman Shaikh: June 02
UCSC
Loop Avoidance Procedure• R sends forwarding table to neighbors before
shutdown
- Thus, Y knows that next hop(R, D) is Y
• Detection: during SPF (Shortest Path First)
calculation neighbors detect loops- Y checks if R exists on the path to D or not
• Upon detection, neighbors send avoid messages to other routers in the domain
- avoid(R, D) = avoid using R for reaching D• Prevention: upon receiving the avoid(R, D)
message, other routers calculate a new path to D
such that R does not appear on it
INFOCOM 2002 13Aman Shaikh: June 02
UCSC
Multiple Inactive Routers
• Set of inactive routers: R1, R2, …, Rn
• Loop avoidance procedure applies for each inactive router– Detection
• Router detects loops for all its inactive neighbors
– Prevention• A router can get avoid(Ri, D) messages for j inactive routers (j
<= n)• The router avoids these j forbidden routers on its path to D
• Problem: Set of forbidden routers can be different for different destinations– O(n) shortest path calculations
• n = number of vertices
INFOCOM 2002 14Aman Shaikh: June 02
UCSC
Simplification
• Router avoids all inactive routers if it has some forbidden routers on its path to D– Calculate two SPTs:
1.SPT with all inactive routers on it
2.SPT w/o any inactive router on it– If the path to D does not contain any forbidden
routers on it,• pick next hop for D from the first SPT
– Else,• pick next hop for D from the second SPT
INFOCOM 2002 15Aman Shaikh: June 02
UCSC
Performance
• Maximum effect on the SPF calculation– Quantify overhead– Impact of
• Topology size• Number of inactive routers
• Prototype Implementation– IBB extension incorporated into GateD 4.0.7
INFOCOM 2002 16Aman Shaikh: June 02
UCSC
Testbed Setup
SUT
LAN
TopTracker
TT
Physical Topology
LSAs
Routers underupgrade
SUT
TopTracker
TT
1
20
R’1 R’2 R’m
R1 R2 Rm
M1
Complete graphWith n nodes
1 1 1
111
1 1 1
Emulated topology
SUT’s view of the Topology
INFOCOM 2002 17Aman Shaikh: June 02
UCSC
Experiment Sequence
GateD on SUT IBB-GateD on SUTTime (mins)
T = 0 Bring m rtrs down Bring m rtrs down in IBB mode
T = 4 Send avoid(Ri, Mj) messages to SUT(1<=i<=m, 1<=j<=n)
T = 8 Bring m inactive rtrs up Bring m inactive rtrs up
Case Am inactive rtrs
Case Bm inactive rtrs, avoid them
Overhead =mean SPF time in Case Bmean SPF time in Case A
INFOCOM 2002 18Aman Shaikh: June 02
UCSC
Result
• Sources of overhead:– Second SPF calculation– Graph in case B is larger than in case A
• Gets larger as m increases
0
0.5
1
1.5
2
2.5
3
3.5
50 60 70 80 90 100
# of nodes in connected component (n)
Ove
rhea
d m = 1
m = 2
m = 5
m = 10
INFOCOM 2002 19Aman Shaikh: June 02
UCSC
Conclusions
• IBB proposal: extend OSPF so that a router can be used for forwarding even while its OSPF process is inactive
• Main contribution: an algorithm that gracefully handles topological changes– Stops using the inactive router for a
destination when using the router can lead to loops or black holes
– Overhead of the algorithm is modest • Shows good scaling behavior in terms of topology
size and number of inactive routers
INFOCOM 2002 20Aman Shaikh: June 02
UCSC
Future Directions
• Incremental deployment– Can the algorithm be modified so that only a
subset of routers need to support it?
• Measuring other aspects of overhead– Messaging
• Reducing the overhead– SPF calculation: incremental algorithm for
second pass– Better data structures in prototype
• Other protocols …
INFOCOM 2002 22Aman Shaikh: June 02
UCSC
OSPF Background
• Link-state routing protocol– all routers in the domain come to a consistent view of
the topology by exchange of Link State Advertisements (LSAs) • set of LSAs (self-originated + received) at a router = topology
• SPF Calculation – each router calculates a single source shortest path
tree
• Forwarding Information Base (FIB)– each router uses the tree to build its FIB, which
governs packet forwarding