Upload
noel-nicholas-barnett
View
216
Download
0
Tags:
Embed Size (px)
Citation preview
Avici Company ConfidentialReliable Routing for the Internet
Scott Poretsky
Avici Systems, Inc.
June 3, 2002
Core Router Testing for High Availability
Core Router Testing for High Availability
Architecture for the 21st Century Network
IP Network Availability Test Coverage for 99.999% Availability Commercial Test Equipment Requirements
OutlineOutline
Architecture for the 21st Century Network
High Reliability = More RevenueHigh Reliability = More Revenue
Reliability is the single biggest criteria in selecting an ISP, according to Interactive Week/Telechoice
ISP Customer Survey
4
4.14.2
4.3
4.4
4.5
4.6
4.7
4.8
Reliability Value Performance CustomerService
ProvisioningSpeed
Re
lati
ve
Im
po
rta
nc
eISP Customer Survey
4
4.14.2
4.3
4.4
4.5
4.6
4.7
4.8
Reliability Value Performance CustomerService
ProvisioningSpeed
Re
lati
ve
Im
po
rta
nc
e
New IP services demand higher levels of network reliability
Architecture for the 21st Century Network
High Reliability = More ProfitHigh Reliability = More Profit
Compensation for poor router reliability through redundancy and interconnects can increase network cost by up to 50%
VOIP
Core Layer(Backbone Router)
DSLAM L3/4Switch
CMTS GGSN L3/4Switch
DirectConnects
Aggregation Layer(Hub Router)
DirectConnects
ServiceProvider
Peer
ServiceProvider
Peer
EdgeLayer
AccessDevices
VOIP
Core Layer(Backbone Router)
DSLAM L3/4Switch
CMTS GGSN L3/4Switch
DirectConnects
Aggregation Layer(Hub Router)
DirectConnects
ServiceProvider
PeerPeering
EdgeLayer
AccessDevices
IP Backbone
Architecture for the 21st Century Network
DefinitionsDefinitions
Reliable Capable of being dependable (Webster)
Availability Measure of Reliability using router/switch Uptime
Mission Reliability Mean Time Between Critical Failures (MTBCF) or the average
time between hardware or software failures that interrupt service (the mission)
Maintenance Reliability Mean Time Between Failures (MTBF) or the average time
between hardware failures that require corrective maintenance actions
Defects Per Million (DPM) Measure of downtime equal to (1 – Availability) x 106
Architecture for the 21st Century Network
CrashDump Time Boot TimeProtocol
ConvergenceTime
Total Time to Restore Router/Switch After a Software Failure
Not to ScaleSoftwareFailureOccurs
FullOperationRestored
Time
Mission Reliability
Contributing Factors for Availability Contributing Factors for Availability
Maintainer Response Time Boot TimeProtocol
ConvergenceTime
Total Time to Restore a Module After a Hardware Failure
Not to Scale
Removal and Replacement
Time
HardwareFailureOccurs
Time
Maintenance Reliability
FullOperationRestored
Image Upgrade Time
Architecture for the 21st Century Network
The Availability GoalThe Availability Goal
The Goal – 99.999% Router Availability The Reality – 99.9% Router Availability Features to achieve 99.999% availability.
Non-Stop Routing Graceful Restart
What if testing could could improve Mission Reliability to achieve 99.999% Availability in absence of new features?
What if the addition of these new features would then achieve 99.9999% Availability?
Architecture for the 21st Century Network
Isolated testing of protocols Functionality Conformance Interoperability Scaling
Forwarding Performance in the absence of protocols. Disadvantages
Operational environment is not tested Operational conditions are not tested The router under test is not completely stressed.
Deployed routers run multiple protocols simultaneously.
Traditional Test CoverageTraditional Test Coverage
Architecture for the 21st Century Network
Stress Testing Longevity Testing Convergence Testing Network-Specific Topology Testing Automated Regression Testing
Test Program for 99.999% AvailabilityTest Program for 99.999% Availability
Architecture for the 21st Century Network
Stress TestingStress Testing Simultaneous configuration and scaling of multiple protocols.
BGP, IGP MPLS-TE, LDP (optional) MBGP, PIM-SM, MSDP (optional)
Traffic Forwarding Line Rate Traffic Forwarding Overutilize links Enable QoS
Network Instability Repeated Route Flaps Link Loss Tunnel Reroutes (optional)
Serviceability Repeated SNMP Gets Logging Enabled Debug Enabled Telnet with SHOW commands (stressful and invalid)
Architecture for the 21st Century Network
Stress ConfigurationStress Configuration
Router Under Test
NeighborRouter
NeighborRouter
OptionalNeighbor
Router for Tunnel
Reroutes
Test Equipment
Test Equipment
Test Equipment
Architecture for the 21st Century Network
Stress Execution GuidelinesStress Execution Guidelines
Configure ECMP, Parallel Paths, and Composite Links between routers
Use Live BGP Feed for Route Table Mix traffic types across links (IP Unicast, IP Multicast,
MPLS) One neighbor router should be a different vendor to
show interoperability under stress Run Stress for many days (if the router lasts that
long)
Router should experience more in a couple of days then it likely would in its operational lifetime.
Architecture for the 21st Century Network
Typical Stress MetricsTypical Stress Metrics
Flap 1 million BGP routes per hour Forward 10 Terabits of data per hour Perform 100,000 SNMP Gets per hour Simulate 100 fiber cuts per hour (use every remote
interface) Along with
Full BGP Table Full IGP Table Full Multicast Cache Required MPLS-TE Tunnels (protection optional) Required LDP FECs
Enable Logging and Protocol Debug
Architecture for the 21st Century Network
Longevity TestingLongevity Testing
Similar to Stress Testing, but more operational (less stressful) conditions injected over many weeks.
Simultaneous configuration and scaling of multiple protocols Traffic Forwarding More realistic Network Instability More typical Serviceability actions
Use Live Internet feed.
Architecture for the 21st Century Network
Network Convergence -The point in time at which all nodes in a network have updated
their routing tables for a route entry change (new, withdrawal, or modification)
Protocol Convergence -The point in time in which a single node updates its routing table
and advertises the route table change to its peer in a routing protocol advertisement (or update) message.
Route Convergence - The point in time in which a single node updates its routing table
and reroutes traffic out the new interface.
Route Convergence is the common Router Benchmark.
Convergence TermsConvergence Terms
Architecture for the 21st Century Network
Large number of Protocols in which Convergence is important.
Number of conditions that can impact results.
Technical difficulty in testing convergence of one protocol due to flap or instability of another protocol.
Convergence Test IssuesConvergence Test Issues
Architecture for the 21st Century Network
Interface shutdown on Local Interface on Remote Interface
Fiber Pull on Local Interface on Remote Interface
Peer removal via CLI on Local router on Peer router
Peer node failure Route Table changes
Route Withdrawal Route Flap Next-Hop Change Metric Change Dynamic Constraint Change Policy Change
All conditions must be tested because different results can be produced.
Convergence Test ConditionsConvergence Test Conditions
Architecture for the 21st Century Network
Network-Specific Topology TestingNetwork-Specific Topology Testing
Large network with many routers (e.g. 10) Use multiple vendors for interoperability/functionality
testing. Multiple protocols configured in deployment scenario Run test cases to match deployment scenario
Architecture for the 21st Century Network
Addition of bug fixes/new features put previously working features at risk.
Regression testing ensures that the previously working features still work.
As the number of releases with new features grow it is more difficult to provide complete regression coverage through manual testing (increasingly labor intensive).
Automated regression testing enables more coverage in less time. Automation is typically achieved using TCL scripts. Configuration:
Automated Regression TestingAutomated Regression Testing
Router Under Test
Test Equipment
Architecture for the 21st Century Network
Commercial Test Equipment Commercial Test Equipment RequirementsRequirements
Architecture for the 21st Century Network
Test Equipment fails to meet today’s requirements for testing 99.999% Availability.
Router vendors have been forced to develop their own specialized test tools.
Carriers have been forced to use the router vendor test tools.
Test Equipment vendors must respond to the challenge today.
The State of the UnionThe State of the Union
Architecture for the 21st Century Network
Stress Testing RequirementsStress Testing Requirements
Maintain BGP Sessions and IGP Adjacencies Flap BGP Routes Signal and maintain RSVP-TE tunnels Distribute LDP FECs Signal and maintain Multicast Groups Perform SNMP GETs and check validity Forward Traffic (IP Unicast, IP Multicast, and MPLS)
Make the network seem much bigger than it really is without having to obtain hundreds of routers.
Architecture for the 21st Century Network
Required Protocol Emulation/ Required Protocol Emulation/ Conformance Suites CoverageConformance Suites Coverage Routing Protocols
BGP OSPF, ISIS OSPF-TE, ISIS-TE
RSVP-TE Fast Reroute Standby Tunnels Ingress, Mid-Point, Egress
LDP RFC 2547 Layer 3 VPNs Martini Layer 2 VPNs P and PE LDP over RSVP
Multicast MBGP PIM-SM MSDP
Architecture for the 21st Century Network
Protocol Emulation RequirementsProtocol Emulation Requirements
Run any protocols in combination on the same interface Forward traffic for emulated protocols Protocol Emulation on any interface type – GigE,
10GigE, and POS (including 192c). Scaling
BGP Sessions >500/system, >100/interface BGP Routes >3M/system, >500K/session MPLS-TE Tunnels >10K - Ingress, Mid-Point, Egress FECs >10K
Load external BGP table for advertisement Controlled BGP Route Flapping
Architecture for the 21st Century Network
Commercial test equipment vendors offer protocol conformance TCL suites. Test Case coverage must be improved within each
suite Interaction between protocols must be tested Need each script to test multiple interfaces (4 or
more)
Full Protocol Coverage Multicast protocols have been the “forgotten son”
Automated Regression RequirementsAutomated Regression Requirements
Architecture for the 21st Century Network
System RequirementsSystem Requirements
Multiple ports per chassis (>32) Automated Convergence measurement Automated reroute/failover measurement Support for ECMP and Composite Links System/Protocol Stability For Many Days Ability to store GUI configuration for repeatability. Ability to TCL script any GUI test case.