Upload
dangduong
View
226
Download
7
Embed Size (px)
Citation preview
Scaling BGP BRKRST-3321
Luc De Ghein
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
Agenda
Scale Challenges
Full mesh iBGP
Update Groups
Slow Peer
RR Problems & Solutions
Deployment
Multi-Session
MPLS VPN
Memory Utilization
OS Enhancements
3
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
Scale Challenges
BGP is robust
BGP is simple
– Basically simple, knobs allows for anything
BGP is adaptable and is multi-protocol
BGP allows policy
We need to overcome the following:
– Newer services: new AFs
– More prefixes
– Larger scale: more (BGP) routers
– More multipath
– More resilience PIC Edge, Best External leading to more prefixes/paths
4
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
Growing Tables of BGP
5
Internet sees ~ 2 updates per sec
Source: bgp.potaroo.net
IPv4 IPv6
AS
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
More Services Using BGP
6
IPv4
IDR
IPv4
enterprise
MPLS VPN
IPv6 IDR
BGP flowspec
BGP monitoring
protocol
multicast BGP PW Signalling
(VPLS)
BGP MAC
Signalling (EVPN)
BGP FC
Path Diversity
BGP FC
PIC
mVPN
services
scaling DMVPN Unified
MPLS
1990 1995 1999 2002 2009 2012 future
Inter-AS
MPLS VPN
mVPN
C-signalling
6PE 6VPE
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
Services = Address Families
7
IPv4 unicast
IPv4 multicast
IPv4 MDT
IPv4 MVPN
vpnv4 unicast
vpnv4 multicast
IPv6 unicast
IPv6 multicast
IPv6 MVPN vpnv6 unicast
vpnv6 multicast
l2vpn vpls
l2vpn evpn
rtfilter unicast
nsap unicast
IPv6
IPv4 unicast
multicast
vpn
multicast in overlay
Layer 2
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
mVPN and BGP
8
mVPN old style
– BGP ipv4 MDT needed for multicast signalling in the core
mVPN new style
– New address family mvpn
– BGP replaces PIM in overlay signalling
– BGP Auto Discovery (AD) signalling • Auto Discovery of PE routers
• Advertisement of core/tree tunnel information
– BGP Customer or C-signalling • Customer signalling from PE to PE
BGP in Overlay
PE PE Source
S1,S2 MPLS cloud Receiver
PIM PIM
BGP
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
Seamless MPLS (aka Unified MPLS)
9
Seamless MPLS (aka Unified MPLS) allows to
create end-end Tunnel Label Switch Path and
extend MPLS services without boundary ( Edge
/Aggregation /Access )
To scale Tunnel LSP are construct in 2 layers
- Intra-Area Tunnel LSP (ISIS+LDP or OSPF+LDP)
- Inter-Area Tunnel LSP (BGP+label: RFC3107)
No IP prefix redistribution between IGP domains
IGP prefixes are moved into BGP
Edge
Aggregation Aggregation Access Core
Edge
L2VPN / L3VPN
Inter-Area MPLS
IGP 1 + LDP IGP 2 + LDP IGP 3 + LDP
Intra-Area MPLS
Seamless MPLS
client (PE)
Seamless MPLS
ABR
Seamless MPLS
ABR
iBGP + label (RFC3107)
MP-iBGP + VPN label
Access
Seamless MPLS
client (PE)
MPLS service
Inter-domain LSP
Intra-domain LSP Intra-domain LSP Intra-domain LSP
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
Seamless MPLS (aka Unified MPLS)
10
Edge
Aggregation Aggregation Access Core
Edge
L2VPN / L3VPN
Inter-Area MPLS
IGP 1 + LDP IGP 2 + LDP IGP 3 + LDP
Intra-Area MPLS
Seamless MPLS
client (PE)
Seamless MPLS
ABR
Seamless MPLS
ABR
iBGP + label (RFC3107)
MP-iBGP + VPN label
Access
Seamless MPLS
client (PE)
• ABR is a RR
• RR sets next-hop to self for reflected iBGP prefixes
neighbor x.x.x.x next-hop-self all
• RR is in the forwarding path !
MPLS service
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
Ethernet-VPN (E-VPN)
11
E-VPN is next generation Emulated Ethernet Lan Solution
Customer MAC-Addresses advertised using using BGP
E-VPN defines a single new BGP NLRI used to carry all E-VPN routes
The NLRI has a new SAFI (70)
BGP
PE PE
PE PE 1. New route types and attributes defined
2. MAC address advertisement and withdraw
3. MPLS label
Full Mesh iBGP
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
Full Mesh iBGP Is a Full Mesh Scalable?
Per BGP standard: iBGP needs to be full mesh
Total iBGP sessions = n * (n-1) / 2
Sessions per BGP speaker = n - 1
0
2000
4000
6000
8000
10000
12000
0 50 100 150 200
session
peers Two solutions
1. Confederations
2. Route reflectors
13
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
Confederations Create # of sub-AS inside the larger confederation
Conferation AS looks like normal AS to the outside
R1
R3
R2
R4
subAS 65001
R5
R7
R6
subAS 65002
Full mesh iBGP still needed inside subAS
No full mesh needed between subAS (it’s eBGP)
Every BGP peer needs to be in a subAS
R8
R10
R9
R11
subAS 65003
R12
R13
R14
subAS 65004
R15
R16
confed eBGP
iBGP
eBGP
router bgp 65002
bgp confederation identifier 100
bgp confederation peers 65001 65004
AS 100
14
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
Confederations
Next-hop-self on confed eBGP peerings
R1
R3
R2
R4
subAS 65001
R5
R7
R6
subAS 65002 R8
R10
R9
R11
subAS 65003
R12
R13
R14
subAS 65004
R15
R16
confed eBGP
iBGP
eBGP
AS 100
router bgp 65002
neighbor R3 next-hop-self
neighbor R14 next-hop-self
Each subAs can have different IGP
Increased BGP scaling
– Divide and conquer
router bgp 65004
neighbor R7 next-hop-self
neighbor R11 next-hop-self
15
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
Confederations
R1
R3
R2
R4
subAS 65001
R5
R7
R6
subAS 65002 R8
R10
R9
R11
subAS 65003
R12
R13
R14
subAS 65004
R15
R16
confed eBGP
iBGP
eBGP
AS 100
Flexible confed eBGP peerings
– Redundancy needed vs increased memory/CPU
– But full mesh between subAS’s is not needed
Full mesh between two
subAS’s is not needed
No connectivity needed between
any subAS’s
– e.g. no confed eBGP peering between subAS 65002 and subAS 65003
16
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
Confederations Notes
17
Loop avoidance similar to normal AS
– We introduce a confederation part in the AS_PATH
– Two new segment types: 3: AS_CONFED_SEQUENCE (ordered) ( )
4: AS_CONFED_SET (unordered) [ ]
– They never go out of the Confederation
Attributes do not change across confed-eBGP
– We keep NEXT_HOP, LOCAL_PREF, MED
– Implications: Decisions is consistent (Example: One path with higher LOCAL_PREF wins across the confederation)
Since NEXT_HOP does not change, IGP should be one across the confederation
– Decision process does not change Prefer eBGP paths over iBGP AND confederation-eBGP
e.g. (65001 65002) 300 200
PATH inside confederation
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
Confederations – Pros & Cons
Pros
– Each subAS can have its own IGP
– iBGP policy control • Each subAS can have its own set of policies
― MED acceptance or stripping
― Local Preference settings
― Route Dampening
• This will not be possible with route reflectors
• Also possible: different confed eBGP policies
– Merging companies is easier/transparent • Adding/changing/moving subAS’s
– Route reflectors can exist inside subAS
– BGP sessions can follow physical topology
Cons
– Migration from full mesh to confederation is cumbersome • Ok for new design
18
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
Solve The iBGP Full Mesh Route Reflector (RR)
RR
– A route reflector is an iBGP speaker that reflects routes learned from iBGP peers to other iBGP peers, called RR clients
– iBGP full mesh is turned into hub-and-spoke
– RR is the hub in a hub-and-spoke design
R1 R2 R3 R1 R2 R3
OR router bgp 65002
neighbor R1 route-reflector-client
neighbor R2 route-reflector-client
router bgp 65002
neighbor R1 route-reflector-client
neighbor R2 route-reflector-client
RR clients are interconnected
RR clients are regular
iBGP peers
RR RR
RR
If client-to-client reflection
can be turned off
19
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
Route Reflector What’s Possible?
20
R1 R2 R4 R3
non-client
clients
R5
Any router can peer
eBGP
cluster cluster
AS 100 AS 101
RR RR
iBGP
eBGP
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
Route Reflector Clusters
21
Redundancy needed, min of 2 RRs per cluster
Cluster = RR and its clients
R1 R2 R4 R3
R5
RR RR RR RR
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
Route Reflector Overlapping Clusters
22
Full mesh between RRs and non-clients should be kept small
R1 R2 R4 R3
R5
Client can peer with RRs in multiple clusters
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
Route Reflector Hierarchical RRs
23
Chain RRs to keep the full mesh between RRs and non-clients small
Make RRs clients of other RRs
An RR is a RR and RR client at the same time
iBGP topology should follow physical topology
– To prevent suboptimal routing blackholing and routing loops
Tier 1
Tier 2
Tier 3
There is no limit to the amount of tiers
RR & RR client
RRs in top tier need to be fully meshed
RR
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
Route Reflector Loop Prevention
24
Because we have RRs and same prefix can be advertised multiple times within iBGP cloud: loop prevention needed in iBGP cloud
Two BGP attributes, two ways
– Originator ID
Set to the router ID of the router injecting the route into the AS
Set by the RR
– Cluster List
Each route reflector the route passes through adds their cluster-ID to this list
Cluster-id = Router ID by default
“bgp cluster-id x.x.x.x” command to set cluster-id
Router discard routes if:
– If ORIGINATOR-ID = our ROUTER-ID
– If CLUSTER_LIST contains our CLUSTER-ID
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
Route Reflector Loop Prevention
RR1 and RR2 have different cluster-IDs
RRC1 RRC2
RR1 RR2
AS Path: {65000}
AS Path: {65000}
AS Path: {65000}
Originator-ID: RRC1
Cluster-List: {RR1}
AS Path: {65000}
Originator-ID: RRC1
Cluster-List: {RR1, RR2}
loop detected
25
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
RRs Questions?
How many RRs?
– More RRs = more redundancy
– But also = more management work and more memory
– 2 RRs for each service set is ok More is possible
Using the same or different cluster-ID for RRs in one set?
– Different cluster-ID Additional memory and processor overhead on RR
– Same cluster-ID Less redundant paths
26
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
Route Reflector Same Cluster-ID
RR1 has only 1 path for routes from RRC2
– RR1 and RR2 have the same cluster-ID
If one link RR to RR-client fails
– iBGP session remains up, it is between loopback IP addresses
RRC1 RRC2
RR1 RR2
If different cluster-ID:
– RR1 stores the path from RR2
– RR1 uses additional CPU and memory
– Potentially for many routes
27
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
Route Reflector Route Advertisement
28
RR & RR client
RR
iBGP
eBGP
prefix coming from eBGP peer
RR sends prefix to
clients and non-clients
(and sends to other
eBGP peers)
prefix coming from RR client
RR reflects prefix to
clients and sends to
non-clients (and sends
to eBGP peers)
prefix coming from a non-client
RR reflects prefix to
clients (and sends to
eBGP peers)
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
The Need for Virtual Route Reflectors
More BGP services: dedicated RR virtualized and multiplied: vRR
Benefits:
– Scalability (64b OS)
– Performance (Multi core)
– Independence of operation: reload/update/changes version (VM)
– Same BGP implementation and software version as deployed on the Edge (XE/XR)
– Management (Hypervisor)
A dedicated RR does
– IGP
– BGP
A dedicated RR needs
– CPU
– memory
primary backup
service 1
set vRR’s
service 2
set vRR’s
service 3
set vRR’s
29
service 1 set
RR’s primary
backup
service 2 set
RR’s
service 3 set
RR’s
primary
backup
primary
backup
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
Evolution of Route-Reflector’s From 7200 to vRR
Integration
Scala
bili
ty /
Perf
orm
ance
IOS-7200
IOS XE - ASR1K
64b OS
64b OS
IOS XR 12000
with 8 RP / RR
IOS-XR o 64b Linux
ASK9K Ng VSM
with 4 vRR: 4 x 8 core CPU
vIOS-XE o VMware
IOS-XR ASR9001
64b OS
30
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
BGP RR Scale - Selective RIB Download
31
To block some or all of the BGP prefixes into the RIB (and FIB)
Only for RR which is not in the forwarding path
Saves on memory and CPU
Implemented as filter extension to table-map command
router bgp 1
address-family ipv4
table-map block-into-rib filter
route-map block-into-rib deny 10
RR1#show ip route bgp
RR1#
R1#show ip cef
configuration no BGP prefixes in RIB no BGP prefixes in FIB
For AFs IPv4/6
– not needed for AFs vpnv4/6
Benefit
– ASR testing indicated 300% of RR-client session
scaling (in order of 1000s)
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
Multi-Cluster ID
32
Each set of peers in cluster ID has its own update group
Loop-prevention mechanism is modified
– Taking into account multiple cluster IDs
PE1
RR1
PE2
PE3
PE4
router bgp 1
no bgp client-to-client reflection intra-cluster cluster-id 0.0.0.1
no bgp client-to-client reflection intra-cluster cluster-id 0.0.0.2
cluster ID 2 cluster ID 1
no reflection no reflection
An RR can belong to multiple clusters
– On an IBGP neighbor of RR: configure cluster IDs on a per-neighbor basis
– The global cluster ID is still there
– Intra-cluster client-to-client reflection can be disabled, when clients are meshed
– Can be disabled for all clusters or per cluster
– More work -sending more updates- for RR clients
– Less work -sending fewer updates- for RRs
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
Comparing Confederations vs Route Reflectors
Confederation Route Reflector
Loop prevention AS Confederation Set Originator/Cluster ID
Break up a single AS Sub-AS’ Clusters
Redundancy Multiple connections between sub-AS’ Client connects to several reflectors
External Connections Anywhere in the network Anywhere in the network
Multilevel Hierarchy RRs within sub-AS’ Clusters within clusters
Policy Control Along outside borders and between sub-AS’ Along outside border
Scalability Medium; still requires full iBGP within each
sub-AS
Very high
Migration Very difficult (impossible in some situations) Moderately easy
Deployment of (new) features Decentralized Central on RR
For Your Reference
33
Update Groups
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
Grouping of BGP Neighbors Optimization Can Be Achieved
35
• peer groups
• templates, session-groups, af-groups, neighbor-group
Configuration/administration Performance/scalability
• update groups
• Dynamic way of grouping BGP peers according to common
outbound policy
• BGP formats the update messages once and then
replicates to all members of the update group
• network that have the same best-path attributes can be
grouped into the same message improving packing efficiency
• replication instead of formatting updates per neighbor:
efficiency
• dynamic = policy changes, update group membership
changes
• AF independent : a peer can belong to different update
groups in different address families
• CLI only
BGP neighbors with same outbound policy will
be put in the same update group regardless if
‒ peer-groups are defined
‒ templates are defined
‒ neighbors are individually defined
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
Peer-groups vs Templates
36
Newest configuration
More flexible (nesting, multiple inheritance)
Less configuration
router bgp 1
neighbor ibgp-peers peer-group
neighbor ibgp-peers remote-as 1
neighbor ibgp-peers description peering to iBGP speakers
neighbor ibgp-peers update-source Loopback0
neighbor 10.100.1.3 peer-group ibgp-peers
neighbor 10.100.1.4 peer-group ibgp-peers
!
address-family ipv4
no neighbor 10.100.1.3 activate
no neighbor 10.100.1.4 activate
exit-address-family
!
address-family vpnv4
neighbor ibgp-peers send-community both
neighbor 10.100.1.3 activate
neighbor 10.100.1.4 activate
exit-address-family
peer-groups
router bgp 1
template peer-policy iBGP-policy
send-community both
inherit peer-policy nhs 10
exit-peer-policy
!
template peer-policy nhs
next-hop-self
exit-peer-policy
!
template peer-session ibgp-peers
remote-as 1
description peering to iBGP speakers
update-source Loopback0
exit-peer-session
!
neighbor 10.100.1.3 inherit peer-session ibgp-peers
address-family vpnv4
neighbor 10.100.1.3 activate
neighbor 10.100.1.3 send-community extended
neighbor 10.100.1.3 inherit peer-policy iBGP-policy
exit-address-family
templates
nesting
possible
Oldest configuration
peer session for
global/session
commands
(AF independent)
peer policy for
policy commands
(AD dependent)
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
Update Groups and Route Reflectors
Update groups are very usefull on all BGP speakers
– but mostly on RR due to
• # of peers
• equal outbound policy
iBGP typically has no outbound policy
– RRs have large number of iBGP peers in one update group
Turn the issue into a solution
– Create one BGP update once and replicate for all
– Assuming RR does not change attributes on iBGP sessions
RR#show bgp ipv4 unicast neighbors 10.200.1.1
…
For address family: IPv4 Unicast
Session: 10.200.1.1
BGP table version 3201, neighbor version 3201/0
Output queue size : 0
Index 2, Advertise bit 0
Route-Reflector Client
2 update-group member
…
For address family: VPNv4 Unicast
Session: 10.200.1.1
BGP table version 1, neighbor version 1/0
Output queue size : 0
Index 3, Advertise bit 0
3 update-group member
a peer can belong to
different update groups in
different address families
37
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
Update Group on RR
38
RR#show bgp ipv4 unicast update-group 2
BGP version 4 update-group 2, internal, Address Family: IPv4 Unicast
BGP Update version : 3201/0, messages 0
Route-Reflector Client
Topology: global, highest version: 3201, tail marker: 3201
Format state: Current working (OK, last not in list)
Refresh blocked (not in list, last not in list)
Update messages formatted 2013, replicated 24210, current 0, refresh 0, limit 2000
Number of NLRIs in the update sent: max 812, min 0
Minimum time between advertisement runs is 0 seconds
Has 101 members:
10.100.1.2 10.200.1.1 10.200.1.10 10.200.1.100
10.200.1.11 10.200.1.12 10.200.1.13 10.200.1.14
10.200.1.15 10.200.1.16 10.200.1.17 10.200.1.18
...
RR
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
Update Group Replication
39
RR
BGP update
format BGP update
BGP update
BGP update
replicate
BGP update
BGP update
...
1
n RR#show ip bgp replication 2
Current Next
Index Members Leader MsgFmt MsgRepl Csize Version Version
2 101 10.100.1.2 2013 24210 0/2000 3201/0
update
group 2
formatting
according to
his policy
# of
formatted
messages
# of
replications size of
cache
total # of
members
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
Adaptive Message Cache Size
40
Cache = place to store formatted BGP message, before they are send
Update message cache size throttles update groups during update generation and controls transient memory usage
Is now adaptive
– Variable (change over time) queue depth from 100 to 5000
• Number of peers in an update groups
• Installed system memory
• Type of address family
• Type of peers in an update group
– Benefits
• Update groups with large number of peers get larger update cache
• Allows routers with bigger system memory to have *appropriately* bigger cache size and thereby queue more update messages
• vpnv4 iBGP update groups have larger cache size
• Old cache sizing scheme could not take advantage of expanded memory available on new platforms
• Results in faster convergence
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
New Update Generation and Advertisement IOS (latest)
41
• Delay in BGP message propagation
• Long bgp update generation run
• CPU (and memory) overhead due to “advertised-
bits” for VPN
• One peer can starve buffers
• LIFO behavior
• Temporary memory increase during neighbor/policy
change
• Parallel processing of route-refresh and new peers
– Eliminates serialization of update generation runs
– Serialization = faster peers cannot format new updates until slower one caught up
• Quantum processing in BGP
• Lots of internal changes • e.g. redo “advertised-bits” to save memory
• All BGP processes get fair share of CPU
• More FIFO behavior
• Eliminate serialization of update generation runs
• Better serve new update group members and Route
Refresh requests with progress to other members
• Optimized buffer management
Problem Solution
Results
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
Dynamic Refresh Update Groups Route-Refresh Update Group Older Solution
42
Route refresh request from one peer can hog CPU
Newer behavior in IOS SB train:
– A different update group (one) is created temporarily for servicing the route-refresh per update group
– Route-refresh update group is created when the "bgp refresh timer" expires • This is one minute (default) after first refresh request
• Can be changed with bgp route-refresh service-interval <0-3600>
• Several route refresh requests from one peer or several peers, will be processed together, when "bgp refresh timer expires, in the new route-refresh update group
– Peers in the route-refresh update group receive the full RIB, while the peers in the regular update group receive the regular BGP updates
– Multiple peers from one update-group that need a route refresh will be put in one route-refresh update group for that update group
– Route-refresh update group is deleted after all route refresh requests have been dealt with
– A new peer is treated like a peer with Route-Refresh
For Your Reference
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
Route-Refresh Update Group: When is a Route Refresh Request Sent?
43
A route refresh request is sent, when:
– a user types clear ip bgp [AF] {*|peer} in
– a user types clear ip bgp [AF] {*|peer} soft in
– adding or changing the inbound filtering on the BGP neighbor
• via route-map
– configuring allowas-in for the BGP neighbor
– configuring soft-configuration inbound on the BGP neighbor
– in MPLS VPN (for AFI/SAFI 1/128)
• a user adds a route-target import to a VRF
– in 6VPE (for AFI/SAFI 2/128)
• a user adds a route-target import to a VRF
For Your Reference
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
Parallel Processing of Route-refresh (and New Peers)
44
Original update group handles new transient updates while refresh update group handles re-announcements
IOS
– Serving all peers for which route-refresh is needed, at once
– Tracking the refreshing peers
– Maintains a copy of neighbor instance that needs to re-announce the table
– Allows original update group and its members to advertise transient updates without getting blocked
– Tracked with flags
– SPE flags to indicate refresh
– E flag : refresh end marker. Peer is scheduled or participating in a refresh.
– S flag : refresh has started. If not present: waiting for its refresh to start either because another refresh for the same group is in progress or because the net prepend is not yet complete
– P flag : refresh is paused waiting for other refresh members that started later to catch up
IOS-XR
– Refresh sub-goup
Version 0 Version X
Refresh Group Re-announcements Transient Updates
Table Versions of Prefixes
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
Parallel Processing of Route-Refresh
45
RR#show bgp ipv4 unicast update-group 1 summary
Summary for Update-group 1, Address Family IPv4 Unicast
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd
10.100.1.1 4 1 42 56007 100001 0 0 00:32:39 0
10.100.1.2 4 1 4 3001 1 0 1999 00:00:09 0 (SE)
10.100.1.3 4 1 40 10036 100001 0 0 00:33:06 0
10.100.1.4 4 1 10034 10035 100001 0 0 00:32:23 100000
RR#show bgp ipv4 unicast update-group 1
BGP version 4 update-group 1, internal, Address Family: IPv4 Unicast
BGP Update version : 1/100001, messages 1999
Topology: global, highest version: 100001, tail marker: 100001
Format state: Current working (OK, last no message space)
Refresh blocked (no message space, last no message space)
Update messages formatted 59997, replicated 89997, current 0, refresh 1999, limit 2000
Has 4 members:
10.100.1.1 10.100.1.2 10.100.1.3 10.100.1.4
RR#show bgp ipv4 unicast replication 1
Current Next
Index Members Leader MsgFmt MsgRepl Csize Version Version
1 4 10.100.1.1 59295 89295 1999/2000 1/100001
For Your Reference
Slow Peer
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
Slow Peer Problem
Update groups are a given
– Cannot be disabled
One peer in update group can become persistently slow
– Slow = a peer that cannot keep up with the rate at which we are generating update messages over a prolonged period of time (in the order of minutes)
– High CPU
– Transport issues (packet loss/loaded links/TCP throughput)
A slow peer can slow down convergence towards the other update group members
– Slow in transmitting
– Filling up the replication cache
– The number of formatted updates pending transmission builds up in that update group because one peer (the slow one) does not ack the messages as fast as the other members
– Full cache: no more new messages can be formatted, until the cache is reduced, hence until some messages are send to the slow peers
47
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
Slow Peer Solution
Slow peer management feature – 2 steps
– Detection • Identify the slow peer
• Slow peers are detected by tracking the queue for the peer and TCP
– Protection • Movement of the slow peer out of the update group
• Slower update group is collection of slow peers with matching policies
• Unblocks the remaining update group members resulting in better convergence
• Allows for fast and slow peers to proceed at the their own speeds
Neither is enabled by default
Detection can be enabled without protection
48
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
Slow Peer Solution
Recovery
– When slow peer is no longer slow, it can move back to the original update group
• When the slow update group has converged, the recovery timer is started
• When the recovery timer has expired, the slow peer is moved back
Solution before this feature: manual movement
– Create a different outbound policy for the slow peer
• Policy must be different than any other • You do not want the slow peer to move to another already existing update group
• Use something that does not affect the actual policy • For example: change minimum advertisement interval (MRAI) of the peer (under AF)
49
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
Slow Peer Detection
Detection parameters
– BGP update timestamps
– Peers TCP connection characteristics
per AF/
per VRF
per peer
per peer policy
template
bgp slow-peer detection [threshold <seconds>] default is 5 min
neighbor {<nbr-addr>/<peer-grp-name>} slow-peer detection [threshold < seconds >]
neighbor {<nbr-addr>/<peer-grp-name>} disable
slow-peer detection [threshold < seconds >]
50
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
Slow Peer Protection
2 ways
– static (CLI): by the user
– automatic movement of slower peer(s) out of update group
Separate slow update group with matching policies created
Any slow members are moved to this slow update group
Automatic recovery
– Slow peers can move back to regular update group
• Multiple static slow peers with the same outbound policy will be placed into the same “slow” update group
• Static slow peer and dynamic slow peer are in the same slow peer update group
51
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
Slow Peer Protection
static per neighbor/
per peer-group
automatic per AF/
per VRF
neighbor {<nbr-addr>/<peer-grp-name>} slow-peer split-update-group static
slow-peer split-update-group static static via peer policy
template
bgp slow-peer split-update-group dynamic [permanent]
neighbor {<nbr-addr>/<peer-grp-name>} slow-peer split-update-group dynamic [permanent] automatic per neighor
automatic per peer
policy template slow-peer split-update-group dynamic [permanent]
permanent = peer is not
moved back automatically to
the update group
52
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
Slow Peer Mechanism Details Logging
53
Syslog message for detection (and movement) of slow peer
%BGP-5-SLOWPEER_DETECT: Neighbor IPv4 Unicast 10.100.1.1 has been detected as a
slow peer.
Syslog message for recovery of slow peer
%BGP-5-SLOWPEER_RECOVER: Slow peer IPv4 Unicast 10.100.1.1 has recovered.
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
Slow Peers: Displaying
54
CLI to display slow peers
Applicable to all address families
• show bgp all summary slow
• show bgp ipv4 unicast neighbors slow
• show bgp ipv4 unicast update-group summary slow
For Your Reference
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
Slow Peer Mechanism Details Identifying Slow Peer
55
RR#show bgp ipv4 unicast update-group 1 summary
Summary for Update-group 1, Address Family IPv4 Unicast
BGP router identifier 10.100.1.5, local AS number 1
BGP table version is 500001, main routing table version 500001
100000 network entries using 14400000 bytes of memory
BGP using 24373520 total bytes of memory
BGP activity 115574/15574 prefixes, 300000/200000 paths, scan interval 60 secs
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd
10.100.1.1 4 1 1257 67368 402061 0 2000 18:56:16 0
10.100.1.2 4 1 1219 23362 402061 0 0 18:23:46 0
10.100.1.3 4 1 1257 23398 402061 0 0 18:56:42 0
10.100.1.4 4 1 10002 1891 402061 0 0 00:01:37 100000
output queue is not empty,
persistently?
convergence is achieved if all
peers are at the table version
For Your Reference
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
Slow Peer Mechanism Details Identifying Slow Peer
56
RR#show bgp ipv4 unicast replication 1
Current Next
Index Members Leader MsgFmt MsgRepl Csize Version Version
1 4 10.100.1.1 78114 144314 2000/2000 402061/500001
RR#show bgp ipv4 unicast update-group 1
BGP version 4 update-group 1, internal, Address Family: IPv4 Unicast
BGP Update version : 402061/500001, messages 2000
Topology: global, highest version: 500001, tail marker: 402061 (pending)
Format state: Current blocked (no message space, last no message space)
Refresh blocked (not in list, last not in list)
Update messages formatted 78115, replicated 144318, current 2000, refresh 0, limit 2000
Minimum time between advertisement runs is 0 seconds
Has 4 members:
10.100.1.1 10.100.1.2 10.100.1.3 10.100.1.4
For Your Reference
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
Slow Peer Mechanism Details Enabling Slow Peer Feature - Verifying
57
RR#show bgp ipv4 unicast neighbors 10.100.1.1
BGP neighbor is 10.100.1.1, remote AS 1, internal link
BGP version 4, remote router ID 10.100.1.1
For address family: IPv4 Unicast
Session: 10.100.1.1
BGP table version 700001, neighbor version 602001/700001
Output queue size : 2000
Index 1, Advertise bit 0
1 update-group member
Slow-peer detection is enabled, threshold value is 300
Slow-peer split-update-group dynamic is enabled,, and active
Dynamically detected slow-peer
For Your Reference
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
Slow Peer Mechanism Details
RR#show bgp ipv4 unicast update-group 1
BGP version 4 update-group 1, internal, Address Family: IPv4 Unicast
BGP Update version : 1001311/1100001, messages 2000
Extended-community attribute sent to this neighbor
Topology: global, highest version: 1100001, tail marker: 1001311 (pending)
Format state: Current blocked (no message space, last no message space)
Refresh blocked (not in list, last not in list)
Update messages formatted 108548, replicated 238371, current 2000, refresh 0, limit 2000
Number of NLRIs in the update sent: max 1016, min 0
Minimum time between advertisement runs is 0 seconds
Slow-peer detection timer (expires in 240 seconds)
Has 4 members:
10.100.1.1 10.100.1.2 10.100.1.3 10.100.1.4
Slow-Peer Detection Timer
58
For Your Reference
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
Slow Peer Mechanism Details Slow Peer Movement
59
%BGP-5-SLOWPEER_DETECT: Neighbor IPv4 Unicast 10.100.1.1 has been detected as a slow peer.
RR#show bgp ipv4 unicast neighbors 10.100.1.1
For address family: IPv4 Unicast
Slow-peer detection is enabled, threshold value is 300
Slow-peer split-update-group dynamic is enabled,, and active
RR#show bgp ipv4 unicast update-group 2
BGP version 4 update-group 2, internal, Address Family: IPv4 Unicast
BGP Update version : 402061/500001, messages 1000
Slow update group
Topology: global, highest version: 500001, tail marker: 500001
Update messages formatted 1000, replicated 1000, current 1000, refresh 0, limit 1000
Has 1 member: (1 dynamically detected as slow)
10.100.1.1
For Your Reference
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
Slow Peer Mechanism Details Slow Peer Movement
60
RR#show bgp ipv4 unicast summary slow
BGP router identifier 10.100.1.5, local AS number 1
BGP table version is 500001, main routing table version 500001
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd
10.100.1.1 4 1 1268 67391 402061 0 3000 19:05:34 0
For Your Reference
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
Slow Peer Mechanism Details Update Group Recovers
61
RR#show bgp ipv4 unicast update-group 1 summary
BGP table version is 500001, main routing table version 500001
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd
10.100.1.2 4 1 1233 31494 500001 0 0 18:35:48 0
10.100.1.3 4 1 1271 31529 500001 0 0 19:08:44 0
10.100.1.4 4 1 10014 10023 500001 0 0 00:13:38 100000
RR#show bgp ipv4 unicast replication 1
Current Next
Index Members Leader MsgFmt MsgRepl Csize Version Version
1 3 10.100.1.2 86237 168703 0/2000 500001/0
peer 10.100.1.1 has been removed from update group 1
allowing update group 1 remaining peers
to converge faster
For Your Reference
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
Slow Peer Mechanism Details Recovery of the Slow Peer
62
the recovery timer starts when the slow-peer update group has “converged”
RR#show bgp ipv4 unicast update-group 2
BGP version 4 update-group 2, internal, Address Family: IPv4 Unicast
BGP Update version : 2300001/0, messages 0
Slow update group
Topology: global, highest version: 2300001, tail marker: 2300001
Format state: Current working (OK, last no message space)
Refresh blocked (not in list, last not in list)
Update messages formatted 10000, replicated 10000, current 0, refresh 0, limit 1000
Number of NLRIs in the update sent: max 10, min 0
Minimum time between advertisement runs is 0 seconds
Slow-peer recovery timer (expires in 4 seconds)
For Your Reference
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
Slow Peer Mechanism Details Slow Peer Recovers
63
%BGP-5-SLOWPEER_RECOVER: Slow peer IPv4 Unicast 10.100.1.1 has recovered.
RR#show bgp ipv4 unicast update-group 1 summary
Summary for Update-group 1, Address Family IPv4 Unicast
BGP router identifier 10.100.1.5, local AS number 1
BGP table version is 500001, main routing table version 500001
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd
10.100.1.1 4 1 1282 79396 500001 0 0 19:18:36 0
10.100.1.2 4 1 1244 31505 500001 0 0 18:46:06 0
10.100.1.3 4 1 1282 31541 500001 0 0 19:19:03 0
10.100.1.4 4 1 10025 10035 500001 0 0 00:23:57 100000
peer 10.100.1.1 has been put back into update group 1
For Your Reference
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
Slow Peer Mechanism Details Clearing
64
CLI to clear
This is a forced clear of the slow-peer status; the peer is moved to the original update group
Needed when the permanent keyword is configured
• clear bgp AF {unicast|multicast} * slow
• clear bgp AF {unicast|multicast} <AS number> slow
• clear bgp AF {unicast|multicast} peer-group <group-name> slow
• clear bgp AF {unicast|multicast} <neighbor-address> slow
For Your Reference
RR Problems & Solutions
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
BGP Best Path
The BGP4 protocol specifies the selection and propagation of a single best path for each prefix
As defined in the BGP standard, there are no mechanisms to distribute paths other then best path between its speakers
This behavior results in number of disadvantages for new applications and services
– So a solution was needed
66
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
BGP Best Path Selection
Path selection mechanism Details
Weight This is a Cisco-defined attribute that is assigned locally to your router and does not get carried through to the router updates. If there are multiple paths
to a particular IP address (which is very common), then BGP looks for the path with the highest weight. There are several ways to set the weight parameter, such as the neighbor command, the as-path access list, or route maps.
Local Preference This is an indicator to the AS as to which path has local preference, with the highest preference being preferred. The default is 100.
Network or Aggregate This criterion prefers the path that was locally originated via a network or aggregate. The aggregation of specific routes into one route is very efficient and saves space on your network.
Shortest AS_PATH BGP uses this one only when there is a “tie” comparing weight, local preference, and locally originated vs. aggregate addresses.
Lowest origin type This deals with protocols such as Interior Gateway Protocol (IGP) being a lower preference than Exterior Gateway Protocol (EGP).
Lowest multi-exit discriminator (MED) This is also known as the external metric of a route. A lower MED value is preferred over a higher value
eBGP over iBGP Similar to “lowest origin type”, BGP AS Path prefers eBGP over iBGP
Lowest IGP metric This criterion prefers the path with the lowest IGP metric to the BGP next hop.
Multiple paths This determines if multiple paths require installation in the routing table.
External paths When both paths are external, it prefers the path that was received first (the oldest one).
Lowest router ID This prefers the route that comes from the BGP router with the lowest router ID.
Minimum cluster list If the originator or router ID is the same for multiple paths, it prefers the path with the minimum cluster list length.
Lowest neighbor address This prefers the path that comes from the lowest neighbor address
For Your Reference
http://www.cisco.com/en/US/tech/tk365/technologies_tech_note09186a0080094431.shtml
67
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
Best Path Selection – Route Advertisement on RR
The BGP4 protocol specifies the selection and propagation of a single best path for each prefix
If RRs are used, only the best path will be propagated from RRs to ingress BGP speakers
The next slides explain features to distribute redundant paths to get around the “filtering” by the BGP Bestpath Calculation & Advertisement
NH: PE1, P: Z
NH: PE2, P: Z
P:Z
P: Z
Path 1: NH: PE1, best
Path 2: NH: PE2 P: Z
Path 1: NH: PE1, best
NH: PE1, P: Z
ingress PE does not learn 2nd path
68
CE1
PE1
PE2
RR PE3 CE2
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
Why Having Multiple Paths?
Convergence – BGP Fast Convergence (2+ paths in local BGP DB)
– BGP PIC Edge (2+ paths ready in forwarding plane)
Multipath load balancing
– ECMP
Prevent oscillation
– The additional info on backup paths leeds to local recovery as opposed to relying on iBGP
– Stop persistent route oscillations caused by comparison of paths based on MED in topologies where route reflectors or the confederation structure hide some paths
Allow hot potato routing
– = use optimal route
– The optimal route is not always known on the border routers
69
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
Diverse BGP Path Distribution Overview
VPN unique RD
BGP Best External
BGP shadow RR / session
BGP Add-Path
70
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
Diverse BGP Path Distribution Unique RD for MPLS VPN
Unique RD per VRF per PE
One IPv4 prefix in one VRF becomes unique vpnv4 prefix per VPN per PE
RR advertises everything
Available since the beginning of MPLS VPN, but only for MPLS VPN
NH: PE1, P: Z/RD1
NH: PE2, P: Z/RD2
P:Z
NH: PE1, P: Z/RD1
NH: PE2, P: Z/RD2
RR1
PE1
PE2
VRF
VRF
CE1
PE3
CE2
VRF P: Z Path 1: NH: PE1 Path 2: NH: PE2
RD2
RD1
71
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
Diverse BGP Path Distribution Shadow Route Reflector (aka RR Topologies)
Easy deployment – no upgrade of PE routers is required, just new iBGP session per each extra path (CLI knob on RR2)
One additional “shadow” RR per cluster
RR2 does announce the 2nd best path, which is different from the primary best path on RR1 by next hop
Additional path can be either of the following depending on the configuration: • the oldest multipath that is different than the best path
• backup path (2nd best path which is different from the primary best path by next hop)
NH: PE1, P: Z
NH: PE2, P: Z
P:Z
NH: PE2, P: Z shadow RR
router bgp 1
address-family ipv4
bgp additional-paths select backup
neighbor 10.100.1.3 advertise diverse-path backup
RR1
PE1
PE2
CE1
PE3 CE2
RR2
P: Z Path 1: NH: PE1 Path 2: NH: PE2
P: Z
Path 1: NH: PE1, best
Path 2: NH: PE2
P: Z
Path 1: NH: PE1, best
Path 2: NH: PE2, 2nd best
NH: PE1, P: Z
72
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
Diverse BGP Path Distribution Shadow Route Reflector
RR(config-router-af)#bgp bestpath igp-metric ignore
shadow RR
RR and shadow RR are co-located. They ‘re on same vlan with same IGP metric towards prefix.
Note 2: primary RRs do not need diverse path code
Note 1: primary and shadow RRs do not need to turn off IGP metric check
RR and shadow RR are not co-located.
Note: primary and shadow RRs need to have
the command to turn IGP metric check off. This is in order for all RRs to calculate the same best path so that primary and shadow RRs do not advertise the same path
all links have the same IGP cost
equal distance
link flaps can change topology
P
RR1
RR2
PE1
PE2
P:Z
shadow RR
PE1
P
RR1
RR2 PE2
P:Z
P: Z
Path 1: NH: PE1, best
Path 2: NH: PE2
P: Z
Path 1: NH: PE1, best
Path 2: NH: PE2, 2nd best
P: Z
Path 1: NH: PE1, best
Path 2: NH: PE2
P: Z
Path 1: NH: PE1, 2nd best
Path 2: NH: PE2, best
RR2 advertises same path as RR1 !
solution
73
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
Diverse BGP Path Distribution Shadow Route Reflector
The shadow RR needs to (1) calculate and (2) advertise the backup path
Calculation: triggered by any of the following per address family CLI
– Backup path calculation bgp additional-paths select backup
– Multipath calculation (existing CLI) maximum-paths (e)ibgp <n>
– Backup path calculation AND the installation can be triggered by the following CLI if “bgp additional-paths select backup” is NOT configured already
bgp additional-path install
Advertisement of the backup path
– Backup path advertisement advertise diverse-path backup
router bgp 1
address-family vpnv4
bgp additional-paths select backup
neighbor 10.100.1.3 activate
neighbor 10.100.1.3 send-community both
neighbor 10.100.1.3 route-reflector-client
neighbor 10.100.1.3 advertise diverse-path backup
shadow RR
example config
For Your Reference
74
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
Diverse BGP Path Distribution Shadow Route Reflector - Advertised-Routes
on the shadow RR
RR2#show bgp ipv4 unicast neighbors 10.100.1.3 advertised-routes
BGP table version is 10, local router ID is 10.100.1.10
Status codes: s suppressed, d damped, h history, * valid, > best, i -
internal,
r RIB-failure, S Stale, m multipath, b backup-path, f RT-
Filter,
x best-external, a additional-path, c RIB-compressed,
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found
Network Next Hop Metric LocPrf Weight Path
*bia10.2.5.0/24 10.2.2.12 0 100 0 65000 i
*bia10.2.6.0/24 10.2.4.14 0 100 0 65001 i
*>i 10.100.1.12/32 10.2.2.12 0 100 0 65000 i
*>i 10.100.1.14/32 10.2.4.14 0 100 0 65001 i
RR2#show bgp ipv4 unicast 10.2.5.0/24
BGP routing table entry for 10.2.5.0/24, version 7
Paths: (3 available, best #3, table default)
Advertised to update-groups:
1 2 3
Refresh Epoch 1
65000, (Received from a RR-client)
10.2.2.12 (metric 3) from 10.100.1.2 (10.100.1.2)
Origin IGP, metric 0, localpref 100, valid, internal, backup/repair
rx pathid: 0, tx pathid: 0
Refresh Epoch 1
65000, (Received from a RR-client)
10.2.1.11 (metric 3) from 10.100.1.1 (10.100.1.1)
Origin IGP, metric 0, localpref 100, valid, internal, best
rx pathid: 0, tx pathid: 0x0
For Your Reference
* valid
no “>” not the best path
B backup-path
I internal
A additional-path
this path is advertised
75
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
Diverse BGP Path Distribution Shadow Session
Easy deployment – only RR needs diverse path code and new iBGP session per each extra path (CLI knob in RR1)
Shadow iBGP session does announce the 2nd best path
– 2nd session between a pair of routers is no issue (use different loopback interfaces)
NH: PE1, P: Z
NH: PE2, P: Z
CE1
P:Z
NH: PE1, P: Z
NH: PE2, P: Z
P: Z Path 1: NH: PE1 Path 2: NH: PE2
Note: second session from RR to RR-client (PE3) has diverse-
path command in order to advertise 2nd best path
PE1
PE2
CE1 CE2 RR PE3
76
P: Z
Path 1: NH: PE1, best
Path 2: NH: PE2, 2nd best
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
BGP PIC (Prefix Independent Convergence) Edge
• Convergence in flat FIB is prefix dependent • More prefixes -> more convergence time
• Classical convergence (flat FIB) • Routing protocols react - update RIB -
update CEF table (for affected prefixes)
• Time is proportional to # of prefixes
Problem
Solution
Result
• The idea of PIC: In both SW and HW:
• Pre-install a backup path in RIB
• Pre-install a backup path in FIB
• Pre-install a backup path in LFIB
• Improved convergence
• Reduce packet loss
• Have the same convergence time for all
BGP prefixes (PIC)
77
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
MPLS VPN Dual-homed CE - no PIC Edge
Steps in convergence on ingress PE
1. Egress PE goes down
2. IGP notifies ingress PE in sub-second
3. Ingress PE recomputes BGP bestpath
4. Ingress PE installs new BGP bestpath in RIB
5. Ingress PE installs new BGP bestpath in FIB
6. Ingress PE reprograms hardware
NH: PE1, P: Z
NH: PE2, P: Z
P:Z
P: Z Path 1: NH: PE1, best Path 2: NH: PE2
skip this step with PIC Edge (BGP FRR)
because there is a precomputed
backup/repair path
this scales to the number of prefixes
CE1
PE1
PE2
PE3
CE2
78
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
MPLS VPN
Steps in convergence on ingress PE
1. Egress PE goes down
2. IGP notifies ingress PE in sub-second
3. Switch to repair path with new NH
4. Ingress PE reprograms hardware
Dual-homed CE - PIC Edge
P: Z Path 1: NH: PE1, best Path 2: NH: PE2, backup/repair
router bgp 1
address-family vpnv4
bgp additional-paths install NH: PE1, P: Z
NH: PE2, P: Z
P:Z CE1
PE1
PE2
PE3
CE2
79
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
BGP Additional-Path
Installation of the backup path
– Backup path installation bgp additional-path install
Depending on whether RR is control plane or forwarding plane, user may want to install the backup path by configuring this CLI
– Multipaths are automatically installed per existing CLI behavior
For Your Reference
80
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
No BGP Best External With BGP Default Policy
P:Z
P: Z
Path 1: NH: CE1, localpref 100, external, best
P: Z
Path 1: NH: CE1, localpref 100, external, best
P: Z
Path 1: NH: PE1, internal, localpref 100, best
Path 2: NH: PE2, internal, localpref 100, backup/repair
NH: PE1, localpref: 100, P: Z
NH: PE2, localpref: 100, P: Z
NH: PE2, localpref: 100, P: Z NH: PE1, localpref: 100, P: Z
CE1
PE1
PE2
PE3
CE2
81
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
P:Z CE1
PE1
PE2
PE3
CE2
No BGP Best External Changed Policy
• If default policy is changed, one egress PE could have iBGP path to other egress PE as best path and not its own external BGP path
P: Z
Path 1: NH: CE1, localpref 100, external, best
Path 2: NH: PE1, localpref: 200, internal, best
P: Z
Path 1: NH: CE1, external, best
P: Z
Path 1: NH: PE1, internal, localpref 200, best
NH: PE1, localpref: 200, P: Z local preference 200
NH: PE2, localpref: 100, P: Z
• Even with full mesh in iBGP, policy can prevent egress PE from learning all paths
NH: PE1, localpref: 200, P: Z
no backup/repair
path
82
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
P:Z CE1
PE1
PE2
PE3
CE2
BGP Best External Changed Policy
With Best External, the backup PE (PE2) still propagates its own best external path to the RRs or iBGP peers
PE1 and PE3 have 2 paths
P: Z
Path 1: NH: CE1, external, best
Path 1: NH: PE1, localpref: 200, internal, best
P: Z
Path 1: NH: CE1, external, best
Path 2: NH: PE2, localpref 100, internal, backup/repair
NH: PE1, localpref: 200, P: Z
NH: PE2, localpref: 100, P: Z
P: Z
Path 1: NH: PE1, internal, localpref 200, best
Path 2: NH: PE2, localpref 100, internal, backup/repair
local preference 200
NH: PE2, localpref: 100, P: Z
router bgp 1
address-family vpnv4
bgp additional-paths install
bgp additional-paths select best-external
neighbor x.x.x.x advertise best-external
NH: PE1, localpref: 200, P: Z
backup/repair, advertise-best-external
83
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
CE2
P:Z
ADD Path
Add-Path will signal diverse paths from 2 to X paths
RRs and edge BGP speakers need to have Add-Path support
Add-Path capability (send or receive or both) means that the NRLI encoding is extended to carry the Path Identifier
NH: PE1, P: Z
NH: PE2, P: Z
P: Z
Path 1: NH: PE1, best
Path 2: NH: PE2, best2 P: Z
Path 1: NH: PE1, best
Path 2: NH: PE2, backup/repair
router bgp 1
address-family ipv4
bgp additional-paths select best 2
bgp additional-paths send
neighbor PE3 advertise additional-paths best 2
NH: PE1, P: Z
NH: PE2, P: Z
router bgp 1
address-family ipv4
bgp additional-paths receive
bgp additional-paths install
PE routers need to run newer code in
order to understand second path
CE1
PE1
PE2
PE3 CE2 RR
84
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
BGP Add-path Different Options
IETF draft draft-ietf-idr-add-paths-guidelines defines 4 flavors of Add-x-Path
2 options are implemented by Cisco
• RR will do best path computation for up to n paths and send n paths to the border routers
• This is the only mandatory selection mode • Pros
• less storage used for paths • less BGP info exchanged
• Cons • more best path computation
• Usecase: Primary + n-1 backup scenario (n is limited to 3, to preserve CPU power) = fast convergence
• RR will do the first best path computation and then send all paths to the border routers
• Pros • all paths are available on border routers
• Cons • all paths stored • more BGP info is exchanged
• Usecase: ECMP, hot potato routing
add-n-path add-all-path
bgp additional-paths select best<N> bgp additional-paths select all
85
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
Hot-Potato Routing No RRs
NH: PE2, P: Z
NH: PE1, P: Z NH: PE3, P: Z
eBGP: P: Z
eBGP: P: Z
eBGP: P: Z P: Z
Path 1: NH: PE1
Path 2: NH: PE2
Path 3: NH: PE3, best
Hot-potato routing = packets are passed on (to next AS) as soon as received
Shortest path though own AS must be used
In transit AS: same prefix could be announced many times from many eBGP peers
PE1
PE2
PE3
PE4
86
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
Hot-Potato Routing RRs are Present
Introducing RRs break hot potato routing
Solutions: Unique RD for MPLS VPN or Add Path
P: Z
Path 1: NH: PE1, best
NH: PE2, P: Z
NH: PE3, P: Z
eBGP: P: Z
eBGP: P: Z
eBGP: P: Z
PE1
PE2
PE4
PE3
NH: PE1, P: Z
RR
87
P: Z
Path 1: NH: PE1, best
Path 2: NH: PE2
Path 3: NH: PE3
NH: PE1, P: Z
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
Hot-Potato Routing in Large Transit SP No Add Path
Often large transit ISPs do have full mesh iBGP between regional RRs and hub/spoke between local BR and RR
Full mesh and global hot potato routing
RR
RR
RR RR
RR
RR
PE
PE
PE
PE
PE
PE
PE
PE
PE
88
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
Hot-Potato Routing in Large Transit SP With Add Path
With introduction of add-all-path we now could have centralized RR and do hot-potato routing
Initially, add-all-path could be deployed between centralized and regional RR’s
Also possible: remove the need for regional RR if all PE routers support add-path
RR
RR
RR RR
RR
RR
PE
PE
PE
PE
PE
PE
PE
PE
PE
RR
89
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
Don’t Forget
If you want primary and backup path installed in the RIB of the ingress PE (ECMP), you must have BGP multipath enabled
– But multipath does not solve the issue of RR only sending best path
With full mesh iBGP (and default policy), you can avoid all of the above, possible only needing/wanting BGP multipath on ingress PE
With different RD per PE for one VRF, you can avoid all of the above, but only in MPLS VPN networks
Inter-AS – ASBRs also perform Bestpath calculation and advertise only the bestpath across the AS-border /
into the local AS
– Consequently, the same requirements apply to Inter-AS as Intra-AS Unique RD, Add-Path, Best-external and/or RR-topologies
90
Deployment
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
Path MTU Discovery (PMTUD)
• MSS (Max Segment Size) – Limit on the largest segment that can traverse a TCP session
– Anything larger must be fragmented & re-assembled at the TCP layer
– MSS is 536 bytes by default for client BGP without PMTUD
– Enable PMTU for BGP with
Older command “ip tcp path-mtu-discovery”
Newer command “bgp transport path-mtu-discovery” (PMTUD now on by default)
• 536 bytes is inefficient for Ethernet (MTU of 1500) or POS (MTU of 4470) networks
– TCP is forced to break large segments into 536 byte chunks
– Adds overheads
– Slows BGP convergence and reduces scalability
Another helpful command
show ip bgp neighbors | include max data
92
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
Session/Timers
93
Timers = keepalive and holdtime
– Default is ok
– Smallest is 3/9 for keepalive/holdtime
– Scaling <> small timers
Use BFD
– Built for speed
– When failure occurs, BFD notifies BFD client (in 10s of msecs)
Do not use Fast Session Deactivation (FSD) – Tracks the route to the BGP peer
– A temporary loss of IGP route, will kill off the iBGP sessions
– Very dangerous for iBGP peers IGP may not have a route to a peer for a split second
FSD would tear down the BGP session
– It is off by default neighbor x.x.x.x fall-over
– Next Hop Tracking (NHT), enabed by default, does the job fine
BFD Control Packets
BFD
BGP
EIGRP
OSPF
IS-IS
BFD Clients BFD Clients
BFD
BGP
EIGRP
OSPF
IS-IS
Multisession
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
Multisession
95
BGP Multisession = multiple BGP sessions (and hence TCP sessions) between one pair of BGP speakers
– Even if there is only one BGP neighbor statement defined between the BGP speakers in the configuration)
Introduced with Multi Topology Routing (MTR)
– One session per topology
Now: possibility to have one session per AF/group of AFs
– Good for incremental deployment of AFs
• Avoids a BGP reset
• But multisession needs to be enabled beforehand
– Good for troubleshooting
– Good for issues when BGP session resets • For example “malformed update”
– Not so good for scalability
R2 R1
single session
carrying all AFs
R2 R1
multisession
1 topo per session
R2 R1
multisession
1 AF per session
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
Multisession
96
capability
multisession for
MTR
multisession
without MTR
BGP: 10.100.1.2 passive rcvd OPEN w/ optional parameter type 2 (Capability)
len 3
BGP: 10.100.1.2 passive OPEN has CAPABILITY code: 131, length 1
BGP: 10.100.1.2 passive OPEN has MULTISESSION capability, without grouping
R2#show bgp ipv4 unicast neighbors
BGP neighbor is 10.100.1.1, remote AS 1, internal link
…
BGP multisession with 3 sessions (3 established), first up for 00:05:43
Neighbor sessions:
3 active, is multisession capable
Session: 10.100.1.1 session 1
Topology IPv4 Unicast
Session: 10.100.1.1 session 2
Topology IPv4 Unicast voice
Session: 10.100.1.1 session 3
Topology IPv4 Unicast video
R2#show ip bgp neighbors 10.100.1.1 | include session|address family
BGP multisession with 3 sessions (3 established), first up for 00:02:29
Neighbor sessions:
3 active, is multisession capable
Session: 10.100.1.1 session 1
Session: 10.100.1.1 session 2
Session: 10.100.1.1 session 3
Route refresh: advertised and received(new) on session 1, 2, 3
Multisession Capability: advertised and received
For address family: IPv4 Unicast
Session: 10.100.1.1 session 1
session 1 member
For address family: IPv6 Unicast
Session: 10.100.1.1 session 2
session 2 member
For address family: VPNv4 Unicast
Session: 10.100.1.1 session 3
session 3 member
1 session per
topology
1 session per
address family
For Your Reference
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
Multisession Conclusion
97
Increases # of TCP sessions
Not really needed
Current default behavior = multisession is off
– Can be turned on by “neighbor x.x.x.x transport multi-session”
Makes sense to have IPv4 and IPv6 on seperate TCP sessions
– IPv6 over IPv4 (or IPv4 over IPv6) can be done, but next hop mediation is needed
MPLS VPN
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
RR-groups
99
Use one RR (set of RRs) for a subset of prefixes
– By carving up range of RTs
Only for vpnv4/6
– RR only stores and advertises the specific range of prefixes
Less storage on RR, but more RRs needed + more peerings
RR1
RR2
RR1
RR2
rr-group 1
rr-group 2 PE1 PE2
vpnv4/6
vpnv4/6
vpnv4/6
vpnv4/6
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
RR-groups Configuration Example
100
Dividing of RTs done by simple ext community list 1-99 or ext community list with regular expression 100-500
PEs still send all vpnv4/6 prefixes to RR, but RR filters them
RR1
RR2
RR1
RR2
rr-group 1
rr-group 2 PE1 PE2
vpnv4/6
vpnv4/6
vpnv4/6
vpnv4/6
address-family vpnv4
bgp rr-group 100
address-family vpnv6
bgp rr-group 100
ip extcommunity-list 100 permit RT:1:(1|3|5)....
address-family vpnv4
bgp rr-group 100
address-family vpnv6
bgp rr-group 100
ip extcommunity-list 100 permit RT:1:(2|4|6)....
Dividing RT = more work
PEs are not involved, only RRs
BGP(4): 10.100.1.1 rcvd UPDATE w/ attr: nexthop 10.100.1.1, origin ?, localpref 100, metric 0, extended community
RT:1:10001
BGP(4): 10.100.1.1 rcvd 1:10001:100.1.1.2/32, label 22 -- DENIED due to: extended community not supported;
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
Route Target Constraint (RTC)
Current behavior:
– RR sends all vpnv4/6 routes to PE
– PE routers drops vpnv4/6 for which there is no importing VRF
RTC behavior: RR sends only “wanted” vpnv4/6 routes to PE
– “wanted”: PE has VRF importing the Route Targets for the specific routes
– RFC 4684
– New AF “rtfilter”
Received RT filters from neighbors are translated into oubound filtering policies for vpnv4/6 prefixes
The RT Filtering information is obtained from the VPN RT import list
Result: RR do not send vpnv4/6 unnecessarily prefixes to the PE routers
101
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
Route Target Constraint (RTC)
RR PE1 PE2
PE1 installs Default RT
filter for RR
PE sends all its vpnv4/6
prefixes to RR
capability 1/132 (RTFilter)
for vpnv4 & vpnv6
CE1
CE3
CE2
CE4
origin AS
Route
Target 1:1
origin AS
Route
Target 1:2
RR installs RT filter RT:1:1 & RT 1:2
for PE1 (implicitly denying all else)
MP_REACH_NLRI
AF RTFilter
RR sends only RED and green (not
blue) vpnv4/6 prefixes to PE1
BGP capability exchange
OPEN message
AF RTFilter exchange
AF vpnv4/6 prefixes exchange
origin AS
Route
Target 0:0
102
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
Route Target Constraint (RTC)
103
Results
– Eliminates the waste of processing power on the PE and the waste of bandwidth
– Number of vpnv4 formatted message is reduced by 75%
– BGP Convergence time is reduced by 20 - 50%
– The more sparse the VPNs (few common VPNs on PEs), the more performance gain
Note: PE and RR need the support for RTC
– Incremental deployment is possible (per PE)
– Behavior towards non-RT Constraint peers is not changed
Notes
– RTC clients of RR and non-RTC clients will end up in different update groups on RR
– RTC clients of RR with different set of importing RTs will be in the same update group on the RR In IOS-XR, different filter group under same subgroup
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
Update Groups and RTC
104
Now two levels of grouping of neighbors
We have selective replication possible per update group
– IOS • Update group
• Replication group (not visible) ― One or more per update group
– IOS-XR • Update group
• Filter group ― One or more per update group
Two levels needed for RT Constraint
– Within update group (same outbound policy), still different subset of neighbors possible because of different wanted RTs
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
VRF-Based Dampening
105
BGP route dampening is now configurable per-VRF instead of for whole VPN table
Allows service provider to configure dampening parameters on an individual customer basis
Gives operators more flexible control of unstable customer routes in service provider network
router bgp <asn>
address-family ipv4 [unicast | multicast | vrf vrf-name]
bgp dampening <parameters>
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
Full Internet in a VRF? Why? Because design dictates it
Unique RD, so that RR can advertise 2 paths?
PRO
• Remove Internet routing table from P routers
• Security: move Internet into VPN, out of global
• Added flexibility
• More flexible DDOS mitigation
CON
• Increased memory and bandwidth consumption
Platform must support enough MPLS labels
– Label allocation is per-prefix by default
– Perhaps per-ce or per-vrf label allocation is wanted here
– Per-CE : one MPLS label per next-hop (CE router)
– Per-VRF : one MPLS label per VRF (all CE routers in the VRF)
– Con: IP lookup needed after label lookup
– Now also per-CE and per-VRF label allocation for 6PE
106
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
Full Internet in a VRF?
Two Internet gateways for redundancy
RRs are present: unique RDs needed
Considerations
107
RD 1:1
RD 1:2
PE1
PE2
PE3
PE4
RR
Memory Utilization
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
High Memory Utilization
BGP stores
– Prefixes many
– Paths could be many per prefix
– Attributes could be many per path
Prefixes, Paths, Attributes
AS 100 AS 10 AS 200
AS 20
{200}
{200}
{10 200}
{20 200}
{20 200}
109
R1#show bgp ipv4 unicast 10.1.1.0/24
Paths: (3 available, best #3, table default)
10 200
10.1.5.3 from 10.1.5.3 (10.100.1.3)
Origin IGP, localpref 100, valid, external
Community: internet 200:1 200:2 200:3
20 200
10.100.1.2 (metric 11) from 10.100.1.2 (10.100.1.2)
Origin IGP, metric 0, localpref 100, valid, internal
Community: 100:1 100:2 no-export
20 200
10.1.4.4 from 10.1.4.4 (10.100.1.4)
Origin IGP, localpref 100, valid, external, best
Community: internet 20:20
R1#show bgp ipv4 unicast paths
Address Hash Refcount Metric Path
0x6C8EE00 2229 1 0 20 200 i
0x6C8ED60 2246 1 0 10 200 i
0x6C8EEA0 2264 1 0 20 200 i
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
High Memory Utilization Solutions
110
Prefixes
– Aggregation
– Filter
– Partial routing tables instead of full routing tables
Paths
– Reduce # peerings
– Use RRs instead of iBGP full mesh
Attributes
– Reduce # attributes
• Filter (extended) communties
• Typically inbound on eBGP peering
• Limit your own communities
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
High Memory Utilization
• Multiple views exist in IGPs, as well
–But not on the same scale
–Neighbor adjacencies in IGPs are generally on a lower scale – In the hundreds, not the thousands
–Neighbor adjacencies in IGPs normally pick up different routes, rather than the same route multiple times
Views and Routes
111
For Your Reference
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
High Memory Utilization
The more unique attribute sets you’re receiving, the more unique attribute sets you need to store
– You might have the same number of routes and views over time, but memory utilization can increase
Attributes
10.1.1.0/24
10.1.2.0/24
10.1.3.0/24
10.1.4.0/24
AS Path 1
AS Path 2
Community Set 1
Community Set 2
AS Path 3
Community Set 3
Community Set 4 BGP implementations build their memory structures around
minimizing storage
Attributes are stored once
– Rather than once per route
– Each route references an attribute set, rather than storing the attribute set
This is similar to the way BGP updates are formed
112
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
High Memory Utilization Soft Reconfiguration Inbound vs Route Refresh
113
Recommendation
– Use soft reconfiguration inbound only on eBGP peering when you need to know what the peer previously advertised and you filtered out
AS 10 AS 20 AS 10 AS 20
route refresh soft reconfiguration inbound
Filtered prefixes are dropped: much more memory used
Support only on router itself
Changed filter: re-apply policy to table with filtered prefixes
Filtered prefixes are dropped
Support needed on peer, but this a very old feature
Changed filter: router sends out route refresh request to peer to get the full table from peer again
inbound filter inbound filter
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
Aggregation Reducing Number of Prefixes
114
Aggregate is to summarize many prefixes into fewer larger prefixes
– i.e. smaller mask
Natural for Service Providers, when using PA addressing
Many times not done because traffic engineering or load balancing is needed
Added bonus for aggregate
– Aggregate is stable if only a few components of the aggegate flap
– Even better when dampening is in place
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
Aggregation Aggregate Command
115
Summary-only needed
– Otherwise more specifics are sent too
By default: no AS-SET, no community info, more specifics are advertised
AS 10 AS 20 192.168.1.0/24
192.168.2.0/24
192.168.3.0/24
...
192.168.255.0/24
192.168.0.0/16
router bgp 1
address-family ipv4
aggregate-address 192.168.0.0 255.255.0.0 summary-only
OS Enhancements
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
BGP Scaling Enhancements Overview
BGP PE-CE Scale Enhancements
– Modified internal data structures and optimized internal algorithms for VRF based update generation
– Result = faster convergence / greater VRF and PE-CE session scaling
BGP PE Scale Enhancements
– Modified internal data structures for VRFs
– Result = considerable memory savings / greater prefix scalability
BGP Generic Scale Enhancements
– Parceling of BGP processes
– Created new BGP task IOS process: “BGP Task”
– Result = optimized update generation / faster convergence
BGP Keepalive Enhancements
– Priority queues for reading/writing Keepalive/Update messages
– Results = avoid neighbor flaps / ability to support small keepalive values in a scaled setup
117
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
BGP PE Scale - PE - CE Optimization
118
• Slow convergence when the number of PE-
CE sessions was scaled on a PE router
• Selective evaluation of VPN prefixes for
processing/update generation
• Modified internal data structures and optimized
internal algorithms for VRF based update generation
• Update Generation convergence now bounded by
number of vpn nets and its related peers
• For each CE update group only consider prefixes in
CE’s VRF, not whole VPN table
• Enables VRF and peer session scaling
• Convergence typically improved by 300-400%
• Initial convergence for ASR with 4K VRFs, 8K
peers, and 2.3M routes is under 10 mins
Problem Solution
Results
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
BGP PE Scale - VRF Based Advertised Bits
119
• Increased memory consumption when the
number of VRFs was scaled on a PE router
• Smart reuse of advertise bit space for VRFs
• Prefixes in a VRF used to have advertise bit
for every CE update group on the router
• Bits only needed for CEs in the same VRF
• Nets advertise bits made independent of number of
VPN VRFs configured
• Results in considerable memory savings per net
• For 1000 VRFs, net memory savings of 120+ bytes per net
• For 500 VRFs, net memory savings of 60+ bytes per net
• Greater prefix scalability
Problem Solution
Results
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
BGP General Scale - Update Generation Enhancements
120
Update Generation is the most time sensitive, critical task for BGP
– Need to optimize to improve end-to-end convergence
Update Generation Parceling and a new process
– Parceling uses certain percentage of process quantum
– By parceling we mean having a task execute its work in small portions requiring limited CPU time (fraction of IOS process time)
– Long lived tasks split up their work across multiple parcels
– Less blocking
– Runs as a separate BGP process
– Created new BGP task IOS process: “BGP Task”
– Currently, only the new update generation is task based (4 tasks per BGP table)
Peer based update message queues
Inline freeing of transmitted update messages
Optimizing prefix based checkpointing
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
BGP General Scale - Keepalive Enhancements
121
Persistent network issue
– Delayed processing of BGP Keepalives typically results in session flaps and network outage for
peers configured with aggressive keepalive values
– Unnecessary CPU and transient memory usage
Separate Keepalive Process to handle servicing of keepalive requests
Priority Queues for reading/writing Keepalive/Update messages
Optimizing Keepalive timeout cases
Benefits
– Ability to support aggressive keepalive values in a scaled router setup under stressed conditions
– Fixed unwanted session flaps and network outages
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
Multi-Instance BGP
122
A new IOS-XR BGP architecture to support multiple BGP instances
Each BGP instance is a separate process running on the same or a different RP/DRP node
The BGP instances do not share any prefix table between them
They can peer with each other
Multiple ASNs are possible *
Solves the 32-bit OS virtual memory limit
Different BGP routers: isolate services/AFs on common infrastructure
Achieve higher prefix scale (especially on a RR) by having different instances carrying different BGP tables
Achieve higher session scale by distributing the overall peering sessions between multiple instances
BGP vpnv4
PE1 10.0.0.1
10.0.0.2
BGP IPv4
BGP vpnv4
RR1 (active)
BGP IPv4
RR2 (backup)
BGP
20.0.0.1
30.0.0.1
20.0.0.2
30.0.0.2
* restrictions apply
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
Performance: Multi-thread Strategy
CPU clock: speed doesn’t have to increase anymore
CPU core: number of core per CPU are increasing: 2,4,8,16,64
IOS XE:
– All BGP processes runs within a single threat
– Optimized for single-core CPU
IOS XR: – Each component runs as a separate process (i.e. BGP, OSPF, ISIS)
– Most processes are multi-threaded
– Each thread is scheduled for the same time-slice using pre-emptive scheduling
– BGP has 16+ threads each of which performs a specific set of tasks
– Processes are optimized for multi-core CPUs
123
For Your Reference
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
Maximize your Cisco Live experience with your
free Cisco Live 365 account. Download session
PDFs, view sessions on-demand and participate in
live activities throughout the year. Click the Enter
Cisco Live 365 button in your Cisco Live portal to
log in.
Complete Your Online Session Evaluation
Give us your feedback and you could win fabulous prizes. Winners announced daily.
Receive 20 Cisco Daily Challenge points for each session evaluation you complete.
Complete your session evaluation online now through either the mobile app or internet kiosk stations.
124
© 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-3321 Cisco Public
Fast Convergence and Keeping Scalability
Keep BGP timers, use BFD
NHT enabled by default
Quick alternate paths
– BGP ADD-PATH
– Diverse session
– PIC
– Different RDs
MRAI
– Defaults are fine
Dampening
– One flap is propagated fast, not the subsequent ones