Upload
dominic-payne
View
222
Download
2
Tags:
Embed Size (px)
Citation preview
IP Routing IP Routing
IMAMinneapolis
January, 2004
Timothy G. Griffin Intel Research
http://www.cambridge.intel-research.net/~tgriffin
Common View of the Telco Network
Brick
Common View of the IP Network (Layer 3)
What This Course Is About
Forwarding vs. Routing
• Forwarding: Use of local information (tables, data structures) to determine treatment of data traffic – traffic classification– treatment bases on classification– can drop traffic– or decide where to send it next, and how to send it
• Routing: Use of global information to populate forwarding data structures in multiple network nodes– goal is normally to optimize something given state of the network– difficult to fully automated --- often requires intricate
configuration– is partially automated with dynamic routing protocols
What You Should Take Away
• Heterogeneity– Many technologies– Many networks– Many “routing policies”
• IP Routing is evolving– New protocols and extensions– New router knobs– New ways of using existing technologies
Anthropology of Routing
Protocol Standards
Vendors
Operator Forums
Academic Research
IEEE, IETF, …
NANOG, RIPE, …
Books, Training, …
Cisco, Juniper …
Sun, Microsoft, …
EE
CS
Maths
Could add regulation, markets, ….
8
Routing can happen at any level
Physical
Network
DataLink
Transport
Application
Session
Presentation
Physical
Network
DataLink
Transport
Application
Session
Presentation
data sentdata received
9
IP is a Network Layer Protocol
Physical 1
Network
DataLink 1
Transport
Application
Session
Presentation
Network
Physical 1
DataLink 1
Physical 2
DataLink 2
Router
Physical 2
Network
DataLink 2
Transport
Application
Session
Presentation
Medium 1 Medium 2
Separate physical networks glued together into one logical network
10
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Version| IHL | Service Type | Total Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Identification |Flags| Fragment Offset | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Time to Live | Protocol | Header Checksum | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Source Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Destination Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Options | Padding | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
All Hail the IP Datagram!
HEADER
DATA
1981, RFC 791
... up to 65,515 octets of data ...
::|+|+|
::|+|+|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
shaded fields little-used today
11
IP Hour Glass
IP
Networking Technologies
Networking Applications
Frame ATM
DWDMSONET
Webfile transfer
Ethernet
FDDI
Multimedia
X.25
Remote Access Voice
VPN
Minimalist network layer
TCP
e-stuff
Routers Talking to Routers
Routing info
Routing info
• Routing computation is distributed among routers within a routing domain
• Computation of best next hop based on routing information is the most CPU/memory intensive task on a router
• Routing messages are usually not routed, but exchanged via layer 2 between physically adjacent routers (internal BGP and multi-hop external BGP are exceptions)
Architecture of Dynamic Routing
AS 1
AS 2
BGP
EGP = Exterior Gateway Protocol
IGP = Interior Gateway Protocol
Metric based: OSPF, IS-IS, RIP, EIGRP (cisco)
Policy based: BGP
The Routing Domain of BGP is the entire Internet
OSPF
EIGRP
• Topology information is flooded within the routing domain
• Best end-to-end paths are computed locally at each router.
• Best end-to-end paths determine next-hops.
• Based on minimizing some notion of distance
• Works only if policy is shared and uniform
• Examples: OSPF, IS-IS
• Each router knows little about network topology
• Only best next-hops are chosen by each router for each destination network.
• Best end-to-end paths result from composition of all next-hop choices
• Does not require any notion of distance
• Does not require uniform policies at all routers
• Examples: RIP, BGP
Link State Vectoring
Technology of Distributed Routing
The Gang of Four
Link State Vectoring
EGP
IGP
BGP
RIPIS-IS
OSPF
Autonomous Routing Domains (ARDs)
A collection of physical networks glued togetherusing IP, that have a unified administrativerouting policy.
• Campus networks• Corporate networks• ISP Internal networks• …
Autonomous System (AS) Numbers
16 bit values.
64512 through 65535 are “private”
• Genuity: 1 • MIT: 3• JANET: 786• UC San Diego: 7377• AT&T: 7018, 6341, 5074, … • UUNET: 701, 702, 284, 12199, …• Sprint: 1239, 1240, 6211, 6242, …• …
ASNs represent units of routing policy
Currently over 16,000 in use.
Autonomous Routing Domains Don’t Always Need BGP or an ASN
Qwest
Yale University
Nail up default routes 0.0.0.0/0pointing to Qwest
Nail up routes 130.132.0.0/16pointing to Yale
130.132.0.0/16
Static routing is the most common way of connecting anautonomous routing domain to the Internet. This helps explain why BGP is a mystery to many …
ASNs Can Be “Shared” (RFC 2270)
AS 701UUNet
ASN 7046 is assigned to UUNet. It is used byCustomers single homed to UUNet, but needing BGP for some reason (load balancing, etc..) [RFC 2270]
AS 7046Crestar Bank
AS 7046 NJIT
AS 7046HoodCollege
128.235.0.0/16
ARD != AS
• Most ARDs have no ASN (statically routed at Internet edge)
• Some unrelated ARDs share the same ASN (RFC 2270)
• Some ARDs are implemented with multiple ASNs (example: Worldcom, AT&T, …)
ASes are an implementation detail of Interdomain routing
How Many ASNs?
Thanks to Geoff Huston. http://bgp.potaroo.net on October 24, 2003
15,981
How many prefixes?
Thanks to Geoff Huston. http://bgp.potaroo.net on October 24, 2003
154,894
Note: numbersactually dependspoint of view…
29%
23%
Address space covered
A Bit of OGI’s AS Neighborhood
AS 2914 Verio
AS 11964 OGI128.223.0.0/16
AS 7018 AT&T
AS 1239 Sprint
AS 6366Portland State U
AS 11995Oregon Health Sciences U
AS 3356 Level 3
Sources: ARIN, Route Views, RIPE
AS 3356 Level 3
AS 14262Portland Regional Education Network
AS 7774 U of Alaska
AS 3807 U of Montana
AS 101U of Washington
(Winter '02)
(Win
ter '0
2)
(Summer '03)
UW-Superior
UW-StoutUW-River Falls
Fox Valley TC
UW-Oshkosh
UW-Milwaukee
UW-ParksideUW-Whitewater
UW-Madison
UW-Platteville
UW-La Crosse
UW-Eau Claire
UW-Stevens Point
UW-Green Bay
Marshfield
Rhinelander
Rice Lake
Clintonville
StilesJct.
Portage
Dodgeville
La Crosse
Genuity
OC-3 (155Mbps)
DS-3 (45Mbps)
T1 (1.5Mbps)
OC-12 (622Mbps)
(Summer '02)
Qwestand Other
Provider(s)
(Summ
er '02)
Internet 2& Qwest
Peering - Public and Private Commodity Internet Transit Internet2 Merit and Other State Networks National Education Network Regional Research Peers
Wausau
Gigabit Ethernet
(Summer '02)
(Sum
mer '03)
Chicago - 1
Chicago - 2(Winter '02)
Chicago
wiscnet.net
GO BUCKY!
Partial View of cs.wisc.edu Neighborhood
AS 2381WiscNet
AS 209Qwest
AS 59 UW Academic Computing
128.105.0.0/16
AS 3549Global Crossing
AS 1Genuity
AS 3136 UW Madison
AS 7050 UW Milwaukee
129.89.0.0/16 130.47.0.0/16
26
Policy : Transit vs. Nontransit
AS 701
AS144
AS 701
A nontransit AS allows only traffic originating from AS or traffic with destination within AS
IP traffic
UUnet
Bell Labs
AT&T CBB
A transit AS allows traffic with neither source nor destination within AS to flow across the network
Customers and Providers
Customer pays provider for access to the Internet
provider
customer
IP trafficprovider customer
The “Peering” Relationship
peer peer
customerprovider
Peers provide transit between their respective customers
Peers do not provide transit between peers
Peers (often) do not exchange $$$trafficallowed
traffic NOTallowed
Peering Provides Shortcuts
Peering also allows connectivity betweenthe customers of “Tier 1” providers.
peer peer
customerprovider
Peering Wars
• Reduces upstream transit costs
• Can increase end-to-end performance
• May be the only way to connect your customers to some part of the Internet (“Tier 1”)
• You would rather have customers
• Peers are usually your competition
• Peering relationships may require periodic renegotiation
Peering struggles are by far the most contentious issues in the ISP world!
Peering agreements are often confidential.
Peer Don’t Peer
31
Policy-Based vs. Distance-Based Routing?
ISP1
ISP2
ISP3
Cust1
Cust2Cust3
Host 1
Host 2
Minimizing “hop count” can violate commercial relationships thatconstrain inter-domain routing.
YES
NO
32
Why not minimize “AS hop count”?
Regional ISP1
Regional ISP2
Regional ISP3
Cust1Cust3 Cust2
National ISP1
National ISP2
YES
NO
33
BGP-4• BGP = Border Gateway Protocol
• Is a Policy-Based routing protocol
• Is the de facto EGP of today’s global Internet
• Relatively simple protocol, but configuration is complex and the
entire world can see, and be impacted by, your mistakes.
• 1989 : BGP-1 [RFC 1105]– Replacement for EGP (1984, RFC 904)
• 1990 : BGP-2 [RFC 1163]
• 1991 : BGP-3 [RFC 1267]
• 1995 : BGP-4 [RFC 1771] – Support for Classless Interdomain Routing (CIDR)
34
Four Types of BGP Messages
• Open : Establish a peering session.
• Keep Alive : Handshake at regular intervals.
• Notification : Shuts down a peering session.
• Update : Announcing new routes or withdrawing previously announced routes.
announcement = prefix + attributes values
BGP Attributes
Value Code Reference----- --------------------------------- --------- 1 ORIGIN [RFC1771] 2 AS_PATH [RFC1771] 3 NEXT_HOP [RFC1771] 4 MULTI_EXIT_DISC [RFC1771] 5 LOCAL_PREF [RFC1771] 6 ATOMIC_AGGREGATE [RFC1771] 7 AGGREGATOR [RFC1771] 8 COMMUNITY [RFC1997] 9 ORIGINATOR_ID [RFC2796] 10 CLUSTER_LIST [RFC2796] 11 DPA [Chen] 12 ADVERTISER [RFC1863] 13 RCID_PATH / CLUSTER_ID [RFC1863] 14 MP_REACH_NLRI [RFC2283] 15 MP_UNREACH_NLRI [RFC2283] 16 EXTENDED COMMUNITIES [Rosen] ... 255 reserved for development
From IANA: http://www.iana.org/assignments/bgp-parameters
Mostimportantattributes
Not all attributesneed to be present inevery announcement
Attributes are Used to Select Best Routes
192.0.2.0/24pick me!
192.0.2.0/24pick me!
192.0.2.0/24pick me!
192.0.2.0/24pick me!
Given multipleroutes to the sameprefix, a BGP speakermust pick at mostone best route
(Note: it could reject them all!)
37
BGP Route Processing
Best Route Selection
Apply Import Policies
Best Route Table
Apply Export Policies
Install forwardingEntries for bestRoutes.
ReceiveBGPUpdates
BestRoutes
TransmitBGP Updates
Apply Policy =filter routes & tweak attributes
Based onAttributeValues
IP Forwarding Table
Apply Policy =filter routes & tweak attributes
Open ended programming.Constrained only by vendor configuration language
Route Selection Summary
Highest Local Preference
Shortest ASPATH
Lowest MED
i-BGP < e-BGP
Lowest IGP cost to BGP egress
Lowest router ID
traffic engineering
Enforce relationships
Throw up hands andbreak ties
39
ASPATH Attribute
AS7018135.207.0.0/16AS Path = 6341
AS 1239Sprint
AS 1755Ebone
AT&T
AS 3549Global Crossing
135.207.0.0/16AS Path = 7018 6341
135.207.0.0/16AS Path = 3549 7018 6341
AS 6341
135.207.0.0/16
AT&T Research
Prefix Originated
AS 12654RIPE NCCRIS project
AS 1129Global Access
135.207.0.0/16AS Path = 7018 6341
135.207.0.0/16AS Path = 1239 7018 6341
135.207.0.0/16AS Path = 1755 1239 7018 6341
135.207.0.0/16AS Path = 1129 1755 1239 7018 6341
In fairness: could you do this “right” and still scale?
Exporting internalstate would dramatically increase global instability and amount of routingstate
Shorter Doesn’t Always Mean Shorter
AS 4
AS 3
AS 2
AS 1
Mr. BGP says that path 4 1 is better than path 3 2 1
Duh!
BGP Routing Tables
• Use “whois” queries to associate an ASN with “owner” (for example, http://www.arin.net/whois/arinwhois.html)
• 7018 = AT&T Worldnet, 701 =Uunet, 3561 = Cable & Wireless, …
show ip bgpBGP table version is 111849680, local router ID is 203.62.248.4Status codes: s suppressed, d damped, h history, * valid, > best, i - internalOrigin codes: i - IGP, e - EGP, ? - incomplete
Network Next Hop Metric LocPrf Weight Path
. . .*>i192.35.25.0 134.159.0.1 50 0 16779 1 701 703 i*>i192.35.29.0 166.49.251.25 50 0 5727 7018 14541 i*>i192.35.35.0 134.159.0.1 50 0 16779 1 701 1744 i*>i192.35.37.0 134.159.0.1 50 0 16779 1 3561 i*>i192.35.39.0 134.159.0.3 50 0 16779 1 701 80 i*>i192.35.44.0 166.49.251.25 50 0 5727 7018 1785 i*>i192.35.48.0 203.62.248.34 55 0 16779 209 7843 225 225 225 225 225 i*>i192.35.49.0 203.62.248.34 55 0 16779 209 7843 225 225 225 225 225 i*>i192.35.50.0 203.62.248.34 55 0 16779 3549 714 714 714 i*>i192.35.51.0/25 203.62.248.34 55 0 16779 3549 14744 14744 14744 14744 14744 14744 14744 14744 i. . .
Thanks to Geoff Huston. http://www.telstra.net/ops on July 6, 2001
AS Graphs Depend on Point of View
peer peer
customerprovider
54
2
1 3
6
54
2
6
1 3
54 6
1 3
54
2
6
1 32
AS Graphs Can Be Fun
The subgraph showing all ASes that have more than 100 neighbors in fullgraph of 11,158 nodes. July 6, 2001. Point of view: AT&T route-server
AS Graphs Do Not Show “Topology”!
The AS graphmay look like this. Reality may be closer to this…
BGP was designed to throw away information!
45
So Many Choices
Which route shouldFrank pick to 13.13.0.0./16?
AS 1
AS 2
AS 4
AS 3
13.13.0.0/16
Frank’s Internet Barn
peer peer
customerprovider
46
LOCAL PREFERENCE
AS 1AS 2
AS 4
AS 3
13.13.0.0/16
local pref = 80
local pref = 100
local pref = 90
Higher Localpreference valuesare more preferred
Local preference used ONLY in iBGP
47
Implementing Backup Links with Local Preference (Outbound Traffic)
Forces outbound traffic to take primary link, unless link is down.
AS 1
primary link backup link
Set Local Pref = 100for all routes from AS 1 AS 65000
Set Local Pref = 50for all routes from AS 1
We’ll talk about inbound traffic soon …
48
Multihomed Backups (Outbound Traffic)
Forces outbound traffic to take primary link, unless link is down.
AS 1
primary link backup link
Set Local Pref = 100for all routes from AS 1
AS 2
Set Local Pref = 50for all routes from AS 3
AS 3provider provider
49
Shedding Inbound Traffic with ASPATH Prepending
Prepending will (usually) force inbound traffic from AS 1to take primary linkAS 1
192.0.2.0/24ASPATH = 2 2 2
customerAS 2
provider
192.0.2.0/24
backupprimary
192.0.2.0/24ASPATH = 2
Yes, this is a Glorious Hack …
50
… But Padding Does Not Always Work
AS 1
192.0.2.0/24ASPATH = 2 2 2 2 2 2 2 2 2 2 2 2 2 2
customerAS 2
provider
192.0.2.0/24
192.0.2.0/24ASPATH = 2
AS 3provider
AS 3 will sendtraffic on “backup”link because it prefers customer routes and localpreference is considered before ASPATH length!
Padding in this way is oftenused as a form of loadbalancing
backupprimary
51
COMMUNITY Attribute to the Rescue!
AS 1
customerAS 2
provider
192.0.2.0/24
192.0.2.0/24ASPATH = 2
AS 3provider
backupprimary
192.0.2.0/24ASPATH = 2 COMMUNITY = 3:70
Customer import policy at AS 3:If 3:90 in COMMUNITY then set local preference to 90If 3:80 in COMMUNITY then set local preference to 80If 3:70 in COMMUNITY then set local preference to 70
AS 3: normal customer local pref is 100,peer local pref is 90
Don’t celebrate just yet…
customer
peering
provider/customer
Provider B (Tier 1)Provider A (Tier 1)
Provider C (Tier 2)
Now, customer wants a backup link to C….
provider/customer
Customer installs a “backup link” …
customer
Provider B (Tier 1)Provider A (Tier 1)
Provider C (Tier 2)
customer sends “lower my preference” Community value
primarybackup
Disaster Strikes!
customer
Provider B (Tier 1)Provider A (Tier 1)
Provider C (Tier 2)primary
backup
customer is happy that backup was installed …
The primary link is repaired, and something odd occurs…
customer
Provider B (Tier 1)Provider A (Tier 1)
Provider C (Tier 2)primary
backup
YIKES --- routing DOES NOT return to normal!!!
WAIT! It Gets Better…
A
P
B
BB
C
B
D
P = primary B = backup
OOOOOPS!
A
P
B
BB
C
B
DSuppose A, B, C all break ties in the same direction(clockwise or counter-clockwise)
No solution =Protocol Divergence
What the heck is going on?
• There is no guarantee that a BGP configuration has a unique routing solution. – When multiple solutions exist, the (unpredictable) order
of updates will determine which one is wins.
• There is no guarantee that a BGP configuration has any solution!– And checking configurations NP-Complete [GW1999]
• Complex policies (weights, communities setting preferences, and so on) increase chances of routing anomalies.– … yet this is the current trend!
Are we too complacent?
If the provider/customer digraph is acyclic and every AS obeys the commandments
• Thou shall prefer customer routes over all others
• Thou shall use provider routes only as a last resort
• Thou shall not provide transit between peers or providers
then the BGP configuration is robust. [see Gao-Griffin-Rexford, INFOCOM 2001]
Worrisome trends…
• Some Autonomous Routing Domains (ARDs) are implemented with multiple ASNs (example: MCI, InterNap, AT&T)– Such “sibling” ASes are not confined to “customer/provider,
peer/peer” relationships– ASNs are becoming just an implementation detail.
• Some ASes participate in different roles in different parts of the world (Sprint, for example). – I don’t think we understand this.
• We all know abut MED…– But MED oscillation is not a feature interaction problem (MEDs and
Route Reflection), but rather a manifestation of BGP’s general principle --- the more complex the policies, the more likely that bad things happen. MED just makes it easy to write very complex policies…
• Communities are being used for clever interdomain signaling. – Nobody has read “Inherently Safe Backup Routing with BGP” Gao,
Griffin, Rexford. INFOCOM 2001– “te” communities and extended communities…
Let’s look at “te” communities…
See A survey of the utilization of the BGP community attributeBruno Quoitin and Olivier Bonaventure http://www.infonet.fundp.ac.be/doc/tr/Infonet-TR-2002-02.html
n = 0 do not announce prefix 1 <= n <= 3 prepend n times to announcement
13129:101n - do not announce/prepend to Sprint (AS1239) 13129:102n - do not announce/prepend to Cogent (AS16631) 13129:103n - do not announce/prepend to Abovenet (AS6461) 13129:111n - do not announce/prepend to DE-CIX 13129:112n - do not announce/prepend to INXS 13129:113n - do not announce/prepend to SFINX 13129:114n - do not announce/prepend to LINX 13129:115n - do not announce/prepend to AMS-IX 13129:116n - do not announce/prepend to IX-HH 13129:117n - do not announce/prepend to NYIIX 13129:191n - do not announce/prepend to DTAG (AS3320) 13129:192n - do not announce/prepend to DFN (AS680) 13129:1990 - do not announce to the RIPE RIS project (AS12654)
AS13129, Global Access Telecommunications, Inc (Frankfurt)
Acceptedon inbound routes
Some AS286 Communities
remarks: +----------------------------------------------------------remarks: | COMMUNITIES - ROUTE ORIGIN - NOT SETTABLE BY CUSTOMERSremarks: |remarks: | 286:286 European customer routesremarks: | 286:999 US customer routes (received from Qwest)remarks: | 286:888 European or US peer routesremarks: |remarks: | 286:3000 + countrycode Country where route is receivedremarks: | countrycode E.164 international dial prefixremarks: |remarks: | EXAMPLESremarks: |remarks: | 286:286 286:3031 Customer in Amsterdamremarks: | 286:286 286:3032 Customer in Brusselsremarks: | 286:888 286:3044 Peer in Londonremarks: |remarks: +----------------------------------------------------------
From KPN Eurorings Backbone:
Comment: Aren’t we happy that RPSL has “remarks”!
Need “Semantics of Interdomain Routing”
• Distinct from mechanism of finding routings (protocols). – Don’t start with “new algorithms”!!!
• BGP policy languages/usage have evolved organically --- lack of design.– Too closely tied to mechanism of BGP– RPSL doesn’t even begin to address the issues…
• What do we want to be true? • What do we mean by “autonomy”• How much “expressive power” is really
required?
References
• [VGE1996, VGE2000] Persistent Route Oscillations in Inter-Domain Routing. Kannan Varadhan, Ramesh Govindan, and Deborah Estrin. Computer Networks, Jan. 2000. (Also USC Tech Report, Feb. 1996)
• [GW1999] An Analysis of BGP Convergence Properties. Timothy G. Griffin, Gordon Wilfong. SIGCOMM 1999
• [GSW1999] Policy Disputes in Path Vector Protocols. Timothy G. Griffin, F. Bruce Shepherd, Gordon Wilfong. ICNP 1999
• [GW2001] A Safe Path Vector Protocol. Timothy G. Griffin, Gordon Wilfong. INFOCOM 2001
• [GR2000] Stable Internet Routing without Global Coordination. Lixin Gao, Jennifer Rexford. SIGMETRICS 2000
• [GGR2001] Inherently safe backup routing with BGP. Lixin Gao, Timothy G. Griffin, Jennifer Rexford. INFOCOM 2001
– [GW2002a] On the Correctness of IBGP Configurations. Griffin and Wilfong.SIGCOMM 2002.
– [GW2002b] An Analysis of the MED oscillation Problem. Griffin and Wilfong. ICNP 2002.
Pointers
• Interdomain routing links– http://www.cambridge.intel-research.net/~tgriffin/interdomain/
• These slides– http://www.cambridge.intel-research.net/~tgriffin/talks_tutorials