.
.
BGPBorder Gateway Protocol (an introduction)
Karst Koymans
Informatics InstituteUniversity of Amsterdam
(version 4.8, 2015/03/11 12:41:44)
Monday, March 9, 2015
.
General ideas behind BGPBackgroundProviders, Customers and PeersExternal and Internal BGPBGP information bases
The BGP protocolBGP attributesBGP messages
Traffic EngineeringOutbound Traffic EngineeringInbound Traffic Engineering
IBGP scaling
.
BGP version 4
▶ Border Gateway Protocol version 4 (BGP4)▶ Specified in RFC 4271▶ The inter-AS routing protocol▶ “Monopolises” the Internet▶ Based on path vector routing
▶ which is in-between distance vector and link state routing▶ Uses (often non-coordinated) routing policies
▶ which can be problematic for convergence
.
Autonomous system (AS)
Definition (AS — Autonomous System)▶ A connected group of networks and routers
▶ representing some assigned set of IP prefixes▶ having a single, consistent routing policy▶ both internally and externally
.
.
Autonomous system illustration
.
Providers and Customers
Internet Internet
Provider&&
IPff
OO
IP��
xx
IP88
Customer$$
OO
.
Peers
Provider 1 oo $$ // Provider 2 oo $$ // Provider 3
Customer 1$$
OO
��
IP��
OO
No packets
OOCustomer 2
OO
$$��
IP��
Customer 3$$OO
.
Providers, Customers and Peers
G1 oo$$
// G2
R1
$$66
P1$$
OO
oo $$ // P2
$$
OO
C1
$$
OO
��
IP
��
C2
$$==
��
IP!!C3
$$
aa
$$
==
C4
$$aa
.
.
The AS abstraction
.
Providers, Customers and Peers routing preferences
▶ The order of preference for a route is▶ Customers have highest preference▶ Peers have the next highest preference▶ Providers have the lowest preference
▶ Transit relationships are enforced by export filtering▶ Do not advertise provider or peer routes
to other providers or peers▶ Do advertise all routes to customers▶ Do advertise customer routes to providers and peers
.
Providers, Customers and Peers: Export filtering
.
External and Internal BGP (1)
▶ EBGP (External BGP)▶ Used for BGP neighbors between different ASs
▶ Exchanging prefixes▶ Implementing policies
▶ IBGP (Internal BGP)▶ Used for BGP neighbors within one and the same AS
▶ Distributing Internet prefixes across the backbonein order to create a consistent viewamong all entry/exit points
▶ Inserting locally originated prefixesfor instance for customers that do not speak BGP
.
.
External and Internal BGP (2)
▶ Routes imported from one IBGP peerare not distributed to another IBGP peer
▶ This prevents possible routing loops▶ Loop detection is based on duplicates in AS paths
▶ EBGP detects this between different ASs▶ IBGP cannot detect this inside one and the same AS
▶ Requires IBGP peers to be configured as a full mesh
.
Routing Information Bases (RIBs)
▶ Adj-RIB-In (one per peer)▶ Routes after input filtering▶ Every AS needs an input policy
▶ Loc-RIB (only one globally)▶ Routes after best path selection▶ Path selection is a fixed and specified algorithm
▶ Adj-RIB-Out (one per peer)▶ Routes after output filtering▶ Every AS needs an output policy
.
BGP route processing
▶ Receive BGP update▶ Apply import policy, filter and tweak attributes▶ Possibly install route in Adj-RIB-In▶ Apply best route selection algorithm▶ Possibly install route in Loc-RIB▶ Influence IP forwarding table▶ Apply export policy, filter and tweak attributes▶ Possibly install route in Adj-RIB-Out
▶ Transmit BGP update
.
BGP protocol
▶ Uses TCP over port 179▶ Usually with a directly connected neighbor on layer 2
▶ Exchanges Network Layer Reachability Information (NLRI)▶ Prefixes that can or can no longer be reached through the
router▶ Accompanied by BGP attributes used by the
best route selection algorithm
.
.
Some important BGP attributes
▶ In order of path selection importance▶ LOCAL PREF (Local Preference)▶ AS PATH▶ ORIGIN (Historical)▶ MULTI EXIT DISC (MED; Multi-exit discriminator)
▶ And unrelated to path selection▶ NEXT HOP
▶ Must be reachable (directly or via IGP)except in the case of multi-hop BGP
.
Next Hop in EBGP and IBGP
.
BGP attribute types
▶ Well-known mandatory▶ ORIGIN, AS PATH, NEXT HOP
▶ Well-known discretionary▶ LOCAL PREF, ATOMIC AGGREGATE
▶ Optional transitive▶ COMMUNITIES, AGGREGATOR
▶ Optional non-transitive▶ MULTI EXIT DISC
.
LOCAL PREF (Local Preference)
▶ Advertised within a single AS (via IBGP)▶ Used to implement local policies▶ Can depend on any locally available information
▶ This might be learned outside of BGP▶ Default value is 100▶ Highest value wins
.
.
AS PATH
▶ Sequence of ASs▶ An AS can also be generalized to a set of ASs
▶ Used for loop detection▶ The sequence length defines the metric (distance)▶ Shortest path wins▶ Prepend your own AS in EBGP updates
▶ Possibly multiple times, enabling traffic engineering▶ Leave unchanged in IBGP updates
.
AS PATH example
.
AS PATH length can be deceptive
.
Traffic often follows AS PATH
.
.
Sometimes traffic does not follow AS PATH
.
ORIGIN
▶ The ORIGIN attribute tells where the route (NLRI) originated▶ Interior to the originating AS: ORIGIN = 0▶ Via the EGP protocol (historic): ORIGIN = 1▶ Via some other means: ORIGIN = 2
▶ A lower ORIGIN wins
.
MULTI EXIT DISC (Multi-Exit Discriminator or MED)
▶ The MED (or metric, formerly INTER AS METRIC) is meantto be advertised between neighboring ASs (via EBGP)
▶ Some implementations carry MED on by IBGP▶ Hot potato versus cold potato
▶ The MED is non-transitive (is not transferred into a third AS)▶ A lower MED wins▶ The default MED is 0 (lowest possible value)
▶ Some implementations choose the highest possible value
.
Best route selection
Definition (Route selection preference)
1. (Weight; Cisco specific)2. Highest Local Preference3. Shortest AS Path4. (Lowest Origin; hardly used; historic)5. Lowest MED6. Prefer EBGP over IBGP7. Lowest IGP cost to BGP egress8. Lowest Router ID
.
.
BGP message header
0 15 16 23 24 31
Marker
Length Type
We use the term message and not packet, because BGP “packets”are in fact part of one single TCP-stream.
.
BGP header fields
BGP header fields
Marker 128 bits of 1 (compatibility)Length Total length (min 19, max 4096)
No padding1, Including headerType 1: OPEN
2: UPDATE3: NOTIFICATION
4: KEEPALIVE5: Route-REFRESH
1No superfluous bytes are allowed inside the TCP stream
.
BGP OPEN message
0 7 8 15 16 31
VersionMy Autonomous System
Hold TimeBGP Identifier
Opt Parm LenOptional Parameters
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
(variable)
.
OPEN message fields
OPEN message fields
Version 4My Autonomous System Sender’s AS
Hold Time Liveness detectionBGP Identifier Sender’s identifying IP address
Opt Parm Length Length of parameter fieldOptional Parameters TLV-encoded options
One interesting parameter is the Capabilities Optional Parameter,which defines (among others) the Route Refresh Capability.
.
.
BGP KEEPALIVE message
This page intentionally left blank.http://www.this-page-intentionally-left-blank.org/
.
KEEPALIVE message fields
KEEPALIVE message fields
:)
.
BGP NOTIFICATION message
0 7 8 15 16 31
Error code Error subcodeData
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
(variable)
.
NOTIFICATION message fields
NOTIFICATION message fields
Error code 1: Message Header Error2: OPEN Error
3: UPDATE Error4: Hold Timer Expired
. . .Error subcode Depends on error code
Data Depends on error code and subcode
.
.
BGP Route-REFRESH message
0 15 16 23 24 31
AFI Reserved SAFI
.
Route-REFRESH message fields
Route-REFRESH message fields
AFI Address Family IdentifierReserved 0
SAFI Subsequent Address Family Identifier
.
BGP UPDATE message
0 15 16 31
Unfeasible Routes LengthWithdrawn Routes(variable length)
Total Path Attribute LengthPath Attributes(variable length)
Network Layer Reachability Information(variable length)
.
UPDATE message fields
UPDATE message fields
Unfeasible Routes Length Length of Withdrawn RoutesWithdrawn Routes List of prefixes2
Total Path Attribute Length Length of Path AttributesPath Attributes TLV-encoded attributes
Network Layer Reachability Information List of NLRI prefixes
2A prefix is specified by its length and just enough bytes ofthe network IP address to cover this length
.
.
Tweaking your policies
▶ Outbound traffic▶ Influenced by inbound routes and filters▶ Tweak attributes to influence best route selection▶ You are in control yourself
▶ Inbound traffic▶ Influenced by outbound routes and filters▶ Tweak attributes trying to influence your peers’ best route
selection▶ You are dependent on your peers’ policies
.
Outbound Traffic Engineering
▶ Outbound TE works by manipulating incoming routes▶ Changing local preference▶ Extending inbound AS paths▶ Manipulating the metric (MED), for instance
by using inbound communities▶ It is relatively simple
▶ Based on your own policy▶ You are in control yourself
.
Choice between provider, peer or customer
.
Manipulating local preferencePrefer customer over peer over provider
.
.
Multihomed setup
.
Singlehomed primary and backup links
.
Inbound Traffic Engineering
▶ Inbound TE works by manipulating outgoing routes▶ Extending outbound AS PATHs is a traditional hack▶ Manipulating the metric (MED) is the official way▶ Setting outbound communities is a more modern approach
▶ Agreements with your neighbors are necessary (commonpolicy)
▶ Inbound is more complex than outbound▶ Inbound depends (also) on neighbor’s policy▶ You are not in control by yourself
▶ Announcing more specific routes▶ Method of last resort, but often a bad idea
.
Advertising a longer AS PATH
.
.
Your provider might overrule your effort
.
But you can make an agreement by using a community
.
Hot potato routing
.
Burnt by the hot potato
.
.
Cold potato routing by honoring MEDs
.
Communities
▶ An optional transitive attribute▶ A community can be used to communicate
preferred treatment of a route▶ Communities can be used with both inbound as well as
outbound▶ Some communities have a well-known semantics
▶ NO EXPORT: don’t export beyond current AS (orconfederation)
▶ NO ADVERTISE: don’t export at all
.
Use of communities
▶ Inbound from your upstream▶ Learn where your upstream imported this route▶ You can base policy decisions on that
▶ Outbound to your upstream▶ Request specific upstream treatment
▶ Setting of local preference▶ Announcements or not to specific ASs▶ AS PATH prepending for certain peerings
▶ Your upstream promises to implement the requested policy
.
Route reflectors
▶ Specified in RFC 4456▶ A route reflector is a kind of “super” IBGP peer▶ A route reflector has clients with which it peers via IBGP
and for which it reflects (transitively) routes▶ A route reflector is part of a full mesh of
other route reflectors and non-clients
.
.
Full mesh IBGP
.
Route reflector mesh
.
Confederations
▶ Specified in RFC 5065▶ Use multiple private ASs inside your main AS▶ Talk to the outside world with your main AS
▶ This hides the private ASs▶ Talk to the inside world as if using EBGP and IBGP
▶ Using the different private ASs▶ This needs special AS PATH segment types
.
Confederation with SubAS’s