Advanced Network Design

Advanced Network Design

Eugene Odnoralets Routing Protocols Team

23.11.15 © 2015 Cisco and/or its affiliates. All rights reserved.

Introduction

23.11.15 © 2015 Cisco and/or its affiliates. All rights reserved. 2

High Availability and Fast Convergence

Analyzing potential problems you could face trying to deploy fast convergence. Several techniques that have been developed to allow high availability and fast convergence, including:

§  Graceful restart §  Fast down detection §  Exponential backoff §  Speeding up route selection

Considerations in Fast Convergence

§  Scale and speed are contradictory goals. §  The faster a network converges the less stable it is likely to be.

Fast reactions to changes in the network topology tend to create positive feedback loops, which result in a network that simply will not converge. The pieces of a network you need to be concerned about when considering subsecond (fast) convergence: §  The physical layer how fast can a down link be detected ? §  Routing protocol convergence how fast can a routing protocol react to the

topology change ? §  Forwarding how fast can the forwarding engine on each router in the network

adjust to the new paths that the routing protocol calculates?

Network Meltdown Definition

A state in which a network grinds to a halt due to excessive traffic. A network meltdown generally starts as a broadcast storm that gets out of control but even legitimate network messages can cause a meltdown if the network hasn't been designed to accommodate that level of traffic.

Network Meltdowns

Link between Routers D and G flaps, it cycles between the "down" and "up" states slow enough §  for a routing adjacency to be formed §  for the new link to be advertised as part of the topology too quickly §  for the link to be used Adjacency between D and G forms and tears down as quickly as the routing protocol allows

B

C

A

D G

F

E

Slow down

How to work around this sort of a problem in the routing protocol ? The answer is simple: Slow down ! Methods of slowing down: §  Not reporting all interface transitions from the physical layer up to the routing

protocol. This is called debouncing the interface. §  Slow down neighbor timers. §  Slow down the distribution of information about topology changes. §  Slow down the time that the routing protocol reacts to information about

topology changes.

To provide stability within a routing system

Methods are typically used in routing protocol design and implementation to provide stability within a routing system §  IS-IS

§  a timer regulates how often a router can originate new routing information lsp-gen-interval{level-1|level-2}lsp-max-wait[lsp-initial-waitlsp-second-wait]lsp-max-waitmaximumintervalbetweentwoconsecutiveoccurrencesofanLSPbeinggeneratedlsp-initial-waitinitialLSPgenerationdelaylsp-second-waitholdtimebetweenthefirstandsecondLSPgeneration

§  how often a router can run the shortest path first (SPF) algorithm that calculates the best paths through the network

spf-interval[level-1|level-2]spf-max-wait[spf-initial-waitspf-second-wait]spf-max-waitmaximumintervalbetweentwoconsecutiveSPFcalculationsspf-initial-waitinitialSPFcalculationdelayafteratopologychangespf-second-waitholdtimebetweenthefirstandsecondSPFcalculation

To provide stability within a routing system (cont)

§  OSPF §  similar timers regulate the rate at which topology information can be

transmitted and the frequency at which the shortest path first algorithm can be run.

§  EIGRP §  the simple rule “No route may be advertised until it is installed in

the local routing table” dampens the the speed at which routing information is propagated through the network.

§  routing information is also paced when being transmitted through the network based on the bandwidth between two routers. EIGRP uses 50% of the bandwidth reported by the software.

Do not report everything

Reporting the changes more slowly when they occur quickly or not report some events at all makes routing converge much faster providing the expected stability §  Router should not immediately report all the events of which it is aware:

§  link failure §  neighbor failures

§  Let’s sort out which events are in some sense §  important §  not

§  Example: §  if a router loses contact with an adjacent router because the adjacent

router restarted for some reason do not report the resulting change in topology until it’s clear the neighbor is not coming back

The classic questions

§  How long do you wait before deciding the problem is real ?

§  What happens to traffic you would normally forward to that neighbor while you are waiting ? §  How do you reconnect in a way that allows the network to continue operating correctly ? Two technologies incorporated in routing protocols can answer these questions:

§  Graceful Restart (GR) §  Non-Stop Forwarding (NSF)

Control plane / forwarding plane

What happens to traffic received by a router while it is restarting ?

well, normally §  this traffic is dropped §  any applications that are impacted must retransmit lost data

Prevent this by taking advantage of the separation between the control plane and the forwarding plane: if the control plane fails or restarts for any reason, the data plane can continue forwarding traffic based on the last known good information.

Separation of the control & forwarding plane


locally generated packets

packets for processing

in a distributed router architechture

Non-Stop Forwarding

NSF implemented through Stateful Switchover (SSO) in Cisco products. NSF allows continuous forwarding to take place regardless of the state of the control plane. When the control plane resets it sends a signal to the data plane that it should clear its tables and reset. With NSF enabled this signal from the control plane acts as a signal to mark the current data as stale and to begin aging out the information.

Non-Stop Forwarding (cont)

After we have gotten this far Route Processor (RP) should be able to §  bring the control plane back up §  resynchronize the routing protocol databases §  rebuild the routing table without disturbing the packets that are still being switched by the data plane on the router. This is accomplished through Graceful Restart.

Graceful Restart


Graceful Restart for any routing protocol


sent hello indicate GR capable

build adjacency mark as GR capable

A B

sent hello reset hold timer

control-plane reset hold timer is counting down


reset hold timer

signal database resync set up for database resync

resync database resync database

continue normal operation continue normal operation

Graceful Restart for any routing protocol (cont)

§  Router A & B exchange some form of signaling noting that they are capable of understanding GR signaling and are responding to it correctly.

§  This signaling does not imply that the router is capable of restarting gracefully or forwarding traffic through a local failure Only that it can support a neighboring router performing Graceful Restart

§  However a router where the control and data plane are not cleanly separated, cannot fully support GR it can support the signaling that is necessary for a neighboring router to restart gracefully.

How EIGRP neighbor restart normally occurs


normally operating neighbor relationship


A B



place new neighbor in pending state

send hello

send empty update with initialization bit set set up for database resync

send topology information new neighbor

send topology table


How Graceful Restart resolves the same



build adjacency mark as GR capable

A B



sent hello with restart bit set reset hold timer

place A in local neighbor table send hello

empty update with init & restart bit set setup for database resync

resync database resync database


OSPF Graceful Restart

Two styles of OSPF Graceful Restart are available: §  Graceful Restart using link local signaling §  Graceful Restart using opaque link-state advertisements (LSAs)

Normal OSPF Restart




A B



send hello with an empty neighbor list reset adjacency

place new neighbor in neighbor list

send hello with router-id of new neighbor



exchange databases exchange databases continue normal operation continue normal operation

negotiate db exchange negotiate db exchange

OSPF Graceful Restart using Link Local Signalling

This method of signaling GR, described in the IETF Internet-Draft, “OSPF Restart Signaling,” (draft-nguyen-ospf-restart-04.txt) relies on two mechanisms: §  Link Local Signaling (LLS)

a mechanism described in the IETF Internet-Draft, “OSPF Link-local Signaling” (draft-nguyen-ospf-lls-02.txt). This draft extends the OSPF hello packet format to include TLVs, which can then be used to include additional signaling of various types, such as graceful restart capability and a graceful restart.

§  Out of Band Resynchronization a mechanism described in the IETF Internet-Draft, “OSPF Out-of-Band LSDB Resynchronization” (draft-nguyen-ospf-oob-resync-04.txt). This draft describes a mechanism through which two OSPF routers can resynchronize their link-state databases at any point.

OSPF Graceful Restart using Link Local Signalling




A B

sent hello reset hold timer control-plane reset hold timer is counting down

send hello with an empty neighbor list & the RS bit set

reset hold timer


send hello with router-id of restarting neighbor



exchange databases using out of band sync

exchange databases using out of band sync


negotiate db exchange negotiate db exchange

OSPF Graceful Restart using Opaque LSA




A B


control-plane reset

send Grace LSA

exchange databases using Grace LSA

exchange databases using Grace LSA


reset hold timer

GR timer couting down

Fast Down Detection


Fast Down Detection

Before you can route around a failed link or device, however, you need to detect its failure. Detecting failure is a major concern in the highly available network. You can detect a neighbor or link failure in two ways: §  Polling through fast hellos or other packets, transmitted at Layer 2 or Layer 3 §  Event-driven notification through monitoring some link property, such as the

link carrier

Detecting a Link or Adjacency Failure Using Polling

One common method to detect a link or adjacency failure is polling, or periodically sending hello packets to the adjacent device and expecting a periodic hello packet in return. The two determining factors in the speed at which polling can discover a failed link or device are as follows: §  The rate at which hello packets are transmitted §  The number of hello packets missed before declaring a link or adjacency as

failed

How Fast Does Polling Detect a Down Neighbor ?

A B hellos transmitted

A B C D

last hellos transmitted

10 second hello interval

30 second hold interval

E F

Fast hellos

Using faster times than the defaults in most protocols: §  OSPF can transmit a hello every 330 milliseconds and set the dead interval to

1 second ip ospf dead-interval minimal hello-multiplier multiplier

§  IS-IS can transmit a hello every 330 millisecond and set the dead interval to 1 second

isis hello-interval minimal [level-1 | level-2] isis hello-multiplier multiplier [level-1 | level-2] the hello multiplier is set to 3 by default.

§  EIGRP can transmit a hello every second and set the dead interval to 3 sec ip hello-interval eigrp [autonomous system] [seconds] ip hold-time eigrp [autonomous system] [seconds]

Bidirectional Forwarding Detection - BFD

What's BFD ? §  Lightweight hello protocol designed to run over multiple transport protocols §  Designed for sub-second Layer 3 failure detection §  Any interested client

§  EIGRP §  IS-IS §  OSPF §  etc registers with BFD and is notified as soon as BFD detects a neighbor loss

§  All registered clients benefit from uniform failure detection §  Runs on physical, virtual and bundle interfaces §  Uses UDP port 3784 / 3785 (for echo)

BFD in a distributed router architechture

Route Processor

OSPF

IS-IS Telnet

SNMP

BFD Master

Linecard BFD Agent

FIB Downloader

Linecard BFD Agent

FIB Downloader

Linecard BFD Agent

FIB Downloader

Event-driven notification through monitoring link

Rather than periodically polling rely on event-driven notification of link failures. Rely on lower-layer devices to monitor the link status and notify the routing protocol when the link fails. §  SONET/SDH §  DWDM probably the best known of the fast convergence technologies available; it not only allows the fast detection of down links and devices, but it also provides for link protection, which allows traffic to quickly be switched to a backup fiber link if the primary path fails.

APS protected link

unprotected link

Exponential Backoff


Exponential Backoff in Link-State Protocols

step 2

2nd link flap

step 1

1st link flap initial timer set to 1 sec

send notification

add increment of 1 sec and set timer here

send notification double time and set timer here

step 3

3d link flap send notification set timer to max of 5 sec

A B C flapping link

step 4 set timer to initial

2x maximum (10 seconds)

Exponential Backoff in Link-State Protocols (cont)

Exponential backoff mechanizm can be applied to two different timers in link-state protocols: §  The Link-state generation timer, the case just examined §  The SPF timer, which determines how often a router runs the SPF algorithm

in response to changes in the network

OSPF Exponential Backoff for LSA Generation

OSPF exponential backoff for LSA generation is called LSA throttling Two configuration commands are related to this capability: §  timers throttle lsa all [start-interval] [hold-interval] [max-interval]

start-interval is the initial time hold-interval is the increment max-interval is the maximum time

§  timers lsa arrival [milliseconds] the rate at which a router accepts LSAs with the same LSA-ID

OSPF Exponential Backoff for Running SPF

OSPF exponential backoff for SPF is implemented as OSPF SPF throttling §  timers throttle spf spf-start spf-hold spf-max-wait

§  spf-start is the initial SPF schedule delay in milliseconds §  spf-hold is the minimum hold time between two consecutive SPF calculations §  spf-max-wait is the maximum wait time between two consecutive SPF calculations

IS-IS Exponential Backoff for Running SPF

IS-IS also implements exponential backoff as throttling Three commands are used to configure: §  LSP generation

lsp-gen-interval [level-1 | level-2] lsp-max-wait [lsp-initial-wait lsp-second-wait]

§  SPF run spf-interval [level-1 | level-2] spf-max-wait [spf-initial-wait spf-second-wait]

§  PRC throttling prc-interval prc-max-wait [prc-initial-wait prc-second-wait]

Speeding up route selection


Calculating the Route Faster

Another area where the convergence speed of a network could be decreased is in route calculation. How long does it take to calculate the best path to a destination in the network after you have detected and reported an event ? Consider tuning: §  feasible successors in EIGRP §  link-state partial SPF §  link-state incremental SPF

EIGRP Feasible Successors

EIGRP calculates not only the best path to each reachable destination but also feasible successors, which are known as loop-free routes to the same destination. The route to 172.17.1.0/24 §  through 172.17.3.1 has reported distance of 2167296 §  through 172.18.8.4 feasible distance of 2172416 router#showipeigrptopo172.17.1.0IP-EIGRP(AS100):Topologyentryfor172.17.1.0/24StateisPassive,Queryoriginflagis1,1Successor(s),FDis2172416RoutingDescriptorBlocks:172.17.2.1(Serial0/0),from172.18.8.4,Sendflagis0x0Compositemetricis(2172416/18944),RouteisInternal....172.17.1.0(Serial0/3),from172.17.3.1,Sendflagis0x0Compositemetricis(2684416/2167296),RouteisInternal

Because the reported distance through 172.17.3.1 is less than the feasible distance through 172.18.8.4, the route through 172.17.3.1 must be loop free. It is a feasible successor.

How EIGRP determines that a nonfeasible successor is loop free

It always takes time to query neighbors and to receive replies which slows down network convergence. Apply this knowledge to network design by considering not only the best path to each destination from a given area in the network but also where the feasible successors are and how to tweak the metrics so that you have a feasible successor where possible.

How EIGRP determines that a nonfeasible successor is loop free (cont)

One such possible situation with a pair of equal cost links: §  A to B link §  A to C link router-b#showipeigrptopo172.17.1.0IP-EIGRP(AS100):Topologyentryfor172.17.1.0/24StateisPassive,Queryoriginflagis1,1Successor(s),FDis2172416RoutingDescriptorBlocks:10.1.1.1(Serial0/0),from10.1.1.1,Sendflagis0x0Compositemetricis(2172416/18944),RouteisInternal....10.3.3.1(Serial0/3),from10.1.3.1,Sendflagis0x0Compositemetricis(2684416/2172416),RouteisInternal

The feasible distance through Router A is equal to the reported distance through Router C, so the route through Router C is not considered a feasible successor. If the Router A to B link or the Router A to C link fails, at least one query is required to re-converge.

172.17.1.0/24

B C

A 10.1.1.1

10.1.2.1 10.1.3.1

Modifying the Delay to Create an EIGRP-Feasible Successor

Modifying the metrics on the Router A to C link by decreasing the delay slightly produces the results router-b#showipeigrptopo172.17.1.0IP-EIGRP(AS100):Topologyentryfor172.17.1.0/24StateisPassive,Queryoriginflagis1,1Successor(s),FDis2172416RoutingDescriptorBlocks:10.1.1.1(Serial0/0),from10.1.1.1,Sendflagis0x0Compositemetricis(2172416/18944),RouteisInternal....10.1.3.1(Serial0/3),from10.1.3.1,Sendflagis0x0Compositemetricis(2684416/2167296),RouteisInternal

The reported distance through Router C is now lower than the feasible distance through Router A, so the path through Router C is considered a feasible successor.

172.17.1.0/24

B C

A 10.1.1.1

10.1.2.1 10.1.3.1

Link-State Partial SPF

Three types of objects along directed graph built using SPF: §  Nodes §  Edges §  Leaves IS-IS treats all IP subnets as leaves off the SPF tree §  172.17.1.0/24 leaf §  172.17.2.0/24 leaf

OSPF treats an external (redistributed) as leaves §  172.17.1.0/24 leaf §  172.17.2.0/24 treated as a node in OSPF (network statement)

172.17.1.0/24

B

C

A

D

172.

17.2

.0/2

4

redistributed

brought into OSPF through

a network statement

Node and a Leaf in the SPF

Removing and adding leaf nodes without recalculating the entire SPF tree is called Partial SPF §  a feature of implementation of OSPF and IS-IS §  the distinction between a node and a leaf in the SPF matters !!! §  changes in leaves in the SPF tree do not cause a complete recalculation of the SPF tree §  if 172.17.1.0/24 fails it is simply removed from the SPF tree §  parts of the tree that contain the nodes A, B, C, and D are not impacted by this change

172.17.1.0/24

B

C

A

D

172.

17.2

.0/2

4

redistributed

brought into OSPF through

a network statement

Link-State Incremental SPF

Incremental SPF takes the concept of a partial SPF one step further. If a specific piece of the SPF tree changes, rather than recalculating the entire tree recompute just a section of the tree §  link to router B fails §  no alternate path exists to router B §  it is unnecessary to recalculate the entire SPF tree §  Instead, SPF can safely remove the branch behind router B §  adjust the routing table accordingly without further calculations

172.17.1.0/24

B

C

A

D

E

Link-State Incremental SPF (cont)

In summary: §  iSPF is more efficient than the full SPF algorithm thereby allowing OSPF/IS-IS to converge faster

§  iSPF also provides a significant advantage when the changes in the network topology are further away from the root of the SPT - the larger the network the more significant the impact

§  iSPF provides greater improvements in convergence time for networks with a high number of nodes and links

a segment of 400-1000 nodes should see improvements

Video

Russian Cisco Support Community

Data Center Voice Security

Routing and Switching

Contact Center

Unified Communications

Воспользуйтесь возможностью и задайте вопросы на форуме Технической Поддержки Cisco - http://russiansupportforum.cisco.com

Голосовая связь

Системы унифицированных коммуникаций Маршрутизация и коммутация

Видео Контакт центры

Центры Обработки данных

Безопасность CUCM CUBE

UCCX UCCE

Telepresence

ASA VPN IPS

ISR44xx/43xx

Nexus 7000 Cat 4900

4500 7600 6500

VSS Протоколы маршрутизации

IOS XE IOS IOS XR

ISR ISR G2

ASR1000

FWSM

ASR90x ASR9000

GSR12000 CRS

Ждем ваших сообщений с хештегом #CiscoConnectRu

CiscoRu Cisco CiscoRussia CiscoRu

23.11.15 © 2015 Cisco and/or its affiliates. All rights reserved.

Technology

Advanced Network Design