View
217
Download
1
Embed Size (px)
Citation preview
Network Layer 7-1
2010 session 1TELE3118: Network
Technologies
Week 7: Network LayerRouting Protocols
Some slides have been taken from:Computer Networking: A Top Down Approach Featuring the Internet, 3rd edition. Jim Kurose, Keith Ross. Addison-Wesley, July 2004. All material copyright 1997-2004. J.F Kurose and K.W. Ross, All Rights Reserved.
Network Layer 7-2
IP routing0.0.0.0 0 192.168.1.1
10.0.0.0 8 172.20.4.1
200.23.16.0 20 199.31.18.4
200.23.18.0 23 172.20.4.1
10.20.0.0 24 199.31.18.4
192.168.1.0 24 L 192.168.1.18
172.20.4.0 24 L 172.20.4.253
199.31.18.0 24 L 199.31.18.52
destination mask loca
l
next-hop
LAN
inte
rface
s172.20.4.253/24
192.168.1.18/24199.31.18.52/24
How is the routing table constructed? Static (manual) Dynamic (routing protocol)
Network Layer 7-3
The Internet Network layer
Note on terminology: “routing” vs. “forwarding” “routing table” vs. “forwarding table”
forwardingtable
Routing protocols•path selection•RIP, OSPF, BGP
IP protocol•addressing conventions•datagram format•packet handling conventions
ICMP protocol•error reporting•router “signaling”
Transport layer: TCP, UDP
Link layer
physical layer
Networklayer
Network Layer 7-4
1
23
0111
value in arrivingpacket’s header
routing algorithm
local forwarding tableheader value output link
0100010101111001
3221
“routing” and “forwarding” tables
Network Layer 7-5
Routing: abstract model
Graph abstraction for routing algorithms:
graph nodes are routers
graph edges are physical links link cost: delay, $
cost, or congestion level
Goal: determine “good” path
(sequence of routers) thru network from source to
dest.
Routing protocol
A
ED
CB
F
2
2
13
1
1
2
53
5
“good” path: typically means
minimum cost path other def’s possible
Network Layer 7-6
Routing algorithm classification
Distance-vector algorithm
Local information: router knows physically-
connected neighbors, link costs to neighbors
2 components: Neighbor routing-table
exchange Bellman-Ford (also
called Ford-Fulkerson) computation
E.g.: RIP
Link-state algorithm Global information:
router knows complete topology and link cost info of entire network
2 components: Reliable flooding Dijkstra shortest-path
tree (SPT) computation
E.g.: OSPF, IS-IS
Network Layer 7-7
Distance vector - RIP
Each node maintains a table of triples.
D
G
A
F
E
B
C
Destination Cost Next-hop
A 1 A
C 1 C
D 2 C
E 2 A
F 2 A
G 3 A
table at B:
Network Layer 7-8
RIP: overview
Iterative, asynchronous, distributed Directly connected neighbors exchange
updates periodically (on the order of several seconds) whenever table changes (called triggered update)
Each update is a vector of distances: (Destination, Cost)
Update local table if receive a “better” route smaller cost came from next-hop
Refresh existing routes; delete if they time out
Network Layer 7-9
RIP: example
Destination Cost Next-hop
B 1 B
C 1 C
D ∞ -
E 1 E
F 1 F
G ∞ -
D
G
A
F
E
B
C
Destination Cost Next-hop
B 1 B
C 1 C
D 2 C
E 1 E
F 1 F
G ∞ -
Destination Cost Next-hop
B 1 B
C 1 C
D 2 C
E 1 E
F 1 F
G 2 F
Initial table at A:
After receiving update from C:
After receiving update from F:
Network Layer 7-10
RIP: recovering from link failure
Dest Cost Nh
A 1 A
B 2 A
C 2 A
D ∞ -
E 2 A
G ∞ -
D
G
A
F
E
B
C
Dest Cost Nh
B 1 B
C 1 C
D 2 C
E 1 E
F 1 F
G ∞ -
Dest Cost Nh
B 1 B
C 1 C
D 2 C
E 1 E
F 1 F
G 3 C
At F:
At A:A receives update from C:
Dest Cost Nh
A 1 A
B 2 A
C 2 A
D 3 A
E 2 A
G 4 A
F receives update from A:
Network Layer 7-11
RIP: link cost decreases
X Z14
12
Y1
X 4 X
Z 1 Z
X 5 Y
Y 1 Y
X 1 X
Z 1 Z
X 5 Y
Y 1 Y
X 1 X
Z 1 Z
X 2 Y
Y 1 Y
At Y:
At Z:
Good news travels fast
Network Layer 7-12
RIP: link cost increases
X Z14
12
Y14
X 4 X
Z 1 Z
X 5 Y
Y 1 Y
X 6 Z
Z 1 Z
X 5 Y
Y 1 Y
X 6 Z
Z 1 Z
X 7 Y
Y 1 Y
At Y:
At Z:
X 8 Z
Z 1 Z
X 7 Y
Y 1 Yand so on
Bad news travels slow “count to infinity” problem loops!
Network Layer 7-13
Breaking the loop …
X Z
14
12
Y14
X 4 X
Z 1 Z
X 5 Y
Y 1 Y
X 14 X
Z 1 Z
X 5 Y
Y 1 Y
X 14 X
Z 1 Z
X 12 X
Y 1 Y
At Y:
At Z:
X 13 Z
Z 1 Z
X 12 X
Y 1 Y
Does this solve the “count to infinity” problem?
If next-hop to D is R: Split Horizon: do not include D
in update to R Split Horizon with Poison
Reverse: include D, but with metric = ∞
Network Layer 7-14
… is not always easy
Dest Cost Nh
B 1 B
C 1 C
D 2 C
E ∞ -
F 1 F
G 2 F
D
G
A
F
E
B
C
Dest Cost Nh
A 1 A
C 1 C
D 2 C
E 3 C
F 2 A
G 3 A
Dest Cost Nh
B 1 B
C 1 C
D 2 C
E 4 B
F 1 F
G 2 F
At A:
B receives update from C:A receives update from B:
Dest Cost Nh
A 1 A
B 1 C
D 1 D
E 5 A
F 2 A
G 2 D
C receives update from A:
Network Layer 7-15
RIPv2 (RFC 2453) details
Included in BSD-UNIX Distribution in 1982 Distance metric: # of hops (∞ = 16): why? Distance vectors only exchanged among
neighbors Up to 25 destinations per RIP update message Update-interval is 30 sec:
If too large, convergence is slow If too small, too much traffic
Triggered update whenever change in routing table
Split horizon mandatory, poison reverse optional
Network Layer 7-16
RIPv2 details (contd.)
Updates sent every 30 (+/- 5) seconds
Route not refreshed for 180 sec is timed-out Still included in update
messages Timed-out route is deleted
(garbage-collected) after 120 sec
Triggered update timer set for 1-5 sec Includes only changed routes Suppressed if regular update due
Address of net 2
Distance to net 2
Command Must be zero
Family of net 2 Must be zero
Family of net 1 Must be zero
Address of net 1
Distance to net 1
Version
0 8 16 31
subnet mask of net 1
subnet mask of net 2
next hop of net 1
next hop of net 2
Network Layer 7-17
RIP: where does it run? RIP runs as application-level process (route-d) Updates sent as UDP message (port 520) Multicast IP address 224.0.0.9 (with TTL=1)
physical
link
network forwarding (IP) table
Transprt (UDP)
routed
physical
link
network (IP)
Transprt (UDP)
routed
forwardingtable
Network Layer 7-18
Link State - OSPF
Strategy: each node learns complete topology send information about directly connected links (not
entire routing table) to entire network (not just neighbors)
Link State Advertisement (LSA) include Nodes (routers) and links (networks) Sequence number and age
Reliable flooding Store most recent LSA for each node Send LSA to all nodes except one that sent it Generate LSA periodically (with higher sequence
number) Age out each stored LSA
Network Layer 7-19
A Link-State Routing Algorithm
Notation: c(x,y): link cost from node x
to y; = ∞ if not direct neighbors
D(v): current value of cost of path from source to dest. v
p(v): predecessor node along path from source to v
N': set of nodes whose least cost path definitively known
Dijkstra’s algorithm Given: all nodes know full
topology and link costs Objective: compute least
cost paths from self to all other nodes routing table
iterative: after k iterations, know least cost path to k destinations
distributed: each node computes shortest-path tree from itself
Network Layer 7-20
Dijsktra’s Algorithm
1 Initialization: 2 N' = {u} 3 for all nodes v 4 if v adjacent to u 5 then D(v) = c(u,v) 6 else D(v) = ∞ 7 8 Loop 9 find w not in N' such that D(w) is a minimum 10 add w to N' 11 update D(v) for all v adjacent to w and not in N' : 12 D(v) = min( D(v), D(w) + c(w,v) ) 13 /* new cost to v is either old cost to v or known 14 shortest path cost to w plus cost from w to v */ 15 until all nodes in N'
Network Layer 7-21
Dijkstra’s algorithm: example
Step012345
N'u
uxuxy
uxyvuxyvw
uxyvwz
D(v),p(v)2,u2,u2,u
D(w),p(w)5,u4,x3,y3,y
D(x),p(x)1,u
D(y),p(y)∞
2,x
D(z),p(z)∞ ∞
4,y4,y4,y
u
yx
wv
z2
2
13
1
1
2
53
5
Network Layer 7-22
Dijkstra’s algorithm, discussionAlgorithm complexity: n nodes each iteration: need to check all nodes, w, not in N n(n+1)/2 comparisons: O(n2) more efficient implementations possible: O(nlogn)
Link Metric Static: link latency, link capacity, … Dynamic: based on load?
e.g.: link cost = amount of carried traffic oscillations!
A
D
C
B1 1+e
e0
e
1 1
0 0
A
D
C
B2+e 0
001+e1
A
D
C
B0 2+e
1+e10 0
A
D
C
B2+e 0
e01+e1
initially… recompute
routing… recompute … recompute
Network Layer 7-23
OSPF details RFC 2328 (244 pages long!) Neighbor up/down detected using “hello” packets LSA reliable flooding over entire AS
LSA includes sequence number and age LSA integrity using checksum (excludes age)
OSPF messages directly over IP (no UDP or TCP) Hierarchical OSPF: allow scaling to larger networks 5 types of LSAs:
1. Router LSA: set of nodes2. Network LSA: set of links3. Summary LSA: inter-area networks4. Summary LSA: area-border-routers5. External LSA: external to AS
Network Layer 7-24
Hierarchical OSPF
Network Layer 7-25
Hierarchical OSPF
Two-level hierarchy: local area, backbone. Link-state advertisements only in area each nodes has detailed area topology; only know
direction (shortest path) to nets in other areas. Area border routers: “summarize” distances to
nets in own area, advertise to other Area Border routers.
Backbone routers: run OSPF routing limited to backbone.
Boundary routers: connect to other AS’s.
Network Layer 7-26
OSPF “advanced” features (not in RIP)
Authentication: prevents malicious intrusion Hierarchy: allows larger domains Load balancing: equal-cost multi-path (ECMP) Extensions to support:
Multicast: MOSPF Traffic-engineering: OSPF-TE
Network Layer 7-27
Comparison of LS and DV algorithms
Messaging DV: entire routing table,
but only exchanged between neighbors
LS: small messages, but flooded in whole network
Speed of Convergence DV: multiple iterations,
each requires recompute and transmit count-to-infinity
problem LS: flood and recalculate,
one shot, faster
Robustness: both LS and DV can be wrecked by one bad router.
In 1997 a bad router in a small ISP advertised a false cost, became flooded with traffic, disconnecting ISPs from most U.S. backbone providers for ~3 hours
Bottom line: No clear winner in terms of
complexity, robustness, etc LS often favored due to
faster convergence
Network Layer 7-28
Hierarchical Routing
scale: with 200 million destinations:
can’t store all dest’s in routing tables!
routing table exchange would swamp links!
administrative autonomy
internet = network of networks
each network admin may want to control routing in its own network
Our routing study thus far - idealization all routers identical network “flat”… not true in practice
Network Layer 7-29
Hierarchical Routing in the Internet Internet is organized as Autonomous Systems (AS)
Each AS is an independent administrative domain (e.g. ISP)
Intra-AS routing protocol All routers in an AS run same intra-AS routing protocol Routers in different AS can run different intra-AS routing
protocol
Inter-AS routing protocol Between routers in different AS
Gateway routers: run both intra-AS and inter-AS routing protocols
Network Layer 7-30
Intra-AS and Inter-AS routing
Gateways:•perform inter-AS routing amongst themselves•perform intra-AS routing with other routers in their AS
inter-AS, intra-AS routing in
gateway A.c
network layer
link layer
physical layer
a
b
b
aaC
A
Bd
A.a
A.c
C.bB.a
cb
c
Network Layer 7-31
IGP vs. EGP
Figure 4.5.2-new2: BGP use for inter-domain routing
AS2 (OSPF
intra-AS routing)
AS1 (RI P intra-AS
routing) BGP
AS3 (OSPF intra-AS
routing)
BGP
R1 R2
R3
R4
R5
Intra-area routing protocol also called Interior Gateway Protocol (IGP) Administrator can choose any: RIP, OSPF, ISIS, …
Inter-area routing protocol also called Exterior Gateway Protocol (EGP) Unique: Border Gateway Protocol (BGP)
Network Layer 7-32
Internet inter-AS routing: BGP
BGP (Border Gateway Protocol): the de facto standard
BGP provides each AS a means to:1. Obtain subnet reachability information from
neighboring ASs.2. Propagate the reachability information to all
routers internal to the AS.3. Determine “good” routes to subnets based
on reachability information and policy. Allows a subnet to advertise its
existence to rest of the Internet: “I am here”
Network Layer 7-33
BGP basics Pairs of routers (BGP peers) exchange routing info over
semi-permanent TCP conctns: BGP sessions Note that BGP sessions do not correspond to physical links. When AS2 advertises a prefix to AS1, AS2 is promising it
will forward any datagrams destined to that prefix towards the prefix. AS2 can aggregate prefixes in its advertisement
3b
1d
3a
1c2aAS3
AS1
AS21a
2c
2b
1b
3c
eBGP session
iBGP session
Network Layer 7-34
Path attributes & BGP routes
When advertising a prefix, advert includes BGP attributes. prefix + attributes = “route”
Path Vector protocol: similar to Distance Vector protocol each Border Gateway broadcast to
neighbors (peers) entire path (i.e., sequence of AS’s) to destination
• E.g., Gateway X may send its path to dest. Z:
Path (X,Z) = X,Y1,Y2,Y3,…,Z when gateway router receives route advert,
uses import policy to accept/decline.
Network Layer 7-35
BGP operation Point-to-point peering BGP peers explicitly configured
Lack of trust no auto-discovery! BGP session runs over TCP
Reliable Can detect neighbor/link down
4 types of messages: OPEN: opens TCP connection to peer and
authenticates sender UPDATE: advertises new path (or withdraws old) KEEPALIVE keeps connection alive in absence of
UPDATES; also ACKs OPEN request NOTIFICATION: reports errors in previous msg; also
used to close connection
Network Layer 7-36
BGP operation (contd.)
BGP peers exchange route prefixes AS-path Route attributes No cost included!
Route prefixes received from peer are filtered and selected (based on AS-path and route attributes) for installation in RIB
Route prefixes from RIB are sent to peer after filtering and selection
All the complexity is in the use of policies for filtering and selection
Network Layer 7-37
BGP attribute: AS-path Prevents looping!
Prefix 138.39.0.0/16, AS1 AS2: AS-path = AS1 AS2 AS3: AS-path = AS2-AS1 AS3 AS1: AS-path = AS3-AS2-AS1 AS1 detects loop, and can reject the route
AS 1
AS 2
AS 3
138.39.0.0/16
(a)
AS 2
AS 3
138.39.0.0/16
(b)AS 1
Partition healing:rare case where AS1 mayaccept “loop” route:
Network Layer 7-38
BGP attribute: Multi-Exit-Discriminator Used when two AS connect to each other in more than
one place Used by AS to advertise degree of preference of each
link to reach a particular prefix Example:
AS1 and AS2 have 2 BGP sessions: one on each link AS2 advertises prefixes of AS3 to AS1 on both links
• MED advertised on link A better than MED advertised on link B
AS 1 AS 2
AS 3
AS 4
Link A
Link B
Network Layer 7-39
MED (contd.) ISP-1 and ISP-2 connect in New York and San Francisco ISP-1 has customer-1 in San Francisco ISP-2 has customer-2 in New York What happens if:
Case A: Both ISPs set and accept MED? Case B: Both ISP-1 and ISP-2 ignore MED? Case C: ISP-1 accepts MED but ISP-2 ignores MED?
ISP 1
ISP 2
Cust 2
Cust 1
Case A:
Network Layer 7-40
BGP attribute: Local-Pref Most commonly used attribute Determines local (i.e. within AS) preference of use of
received route E.g.: say AS3 provides better service than AS2 to AS4
AS4 can configure local-pref of routes from AS3 to be higher (better) than those heard from AS2
AS1 advertises prefix 138.39.0.0.16 to AS2 and AS3 AS4 receives the prefix from both, but chooses the AS3-
AS1 path since it has better local-pref
AS 1
AS 2
AS 3
138.39.0.0/16
AS 4
Network Layer 7-41
BGP policies Can be complex, yet are key to flexibility
and control of inter-AS routing Examples:
Avoid using competitor’s network• avoid routes with AS-n in AS-Path
Avoid transit service, i.e. do not carry any traffic that does not have source or destination within AS
• Do not advertise any non-local routes to peers Let another ISP carry most cross-country load
• Use of MED was shown earlier More examples in subscriber-ISP connection next
Network Layer 7-42
Subscriber connection: singly-homed Easy case! Possible options:
Static configuration: easiest• Customer has default route via R2• ISP configures static route to customer’s prefix
Include customer in ISP’s IGP (too risky!) Run a small IGP (say RIP) on R1-R2 link, leak that into BGP Run a single BGP session
• customer will still likely use a default route or a small set of filtered routes and not absorb the entire Internet routing table
customer ISPR1 R2
AS1
AS2
138.39.2.0/23
BGP session
Network Layer 7-43
Multi-homed subscriber Multiple customer links to one or more ISPs Why?
Reliability (redundancy) Performance (load-sharing)
Challenging Static routing often doesn’t suffice (why?) Want to minimize routing prefixes injected into customer
network BGP configuration requires thought and planning, taking into
account both traffic directions (to and from the customer)
customer
ISP-2ISP-1
Network Layer 7-44
Multi-homing to a single provider
Example 1: same router in ISP, different routers in customer ISP to customer traffic:
customer sets MED Customer to ISP traffic: 2
default routes!
Example 2: different routers in ISP, same router in customer ISP to customer traffic: as
before Customer to ISP traffic:
customer may have to get BGP prefixes from ISP
138.39/16
R1ISP
R3
customerR2
204.70/16
138.39/16
R1ISP
R3
customer
R2
204.70/16
Network Layer 7-45
Multi-homing to multiple providers
Options for customer address space: Exclusively from ISP1 (or from ISP2)
• E.g.: customer uses 138.39.1/24 and advertises this prefix to ISP2• ISP3 gets prefixes 138.39/16 from ISP1 and 138.39.1.24 from ISP2• ISP3 traffic to customer will go via ISP2 (longest prefix match)• Aggregation is pushing traffic away?!
From both ISP1 and ISP2• E.g.: customer uses 138.39.1/24 and 204.70.1/24• Good load-sharing if traffic to these prefixes is about the same
Independently from address registry• Can manipulate load-sharing better, but bad for aggregation!
Bottom line: it all depends on the traffic patterns!
ISP1
customer
ISP3
ISP2138.39/16 204.70/16
Network Layer 7-46
Interaction among routing protocols Every routing protocol is computing its own
routes: how does it all fit? Question: do they interact with each other? Yes! Question: which route is inserted in the forwarding
tables? If conflict, priority mechanism is used
Question: how does IGP fill its routing table? Direct routes: directly-connected interfaces Static routes: user configured
Question: How does BGP fill it routing table? Learns AS local networks from IGP
Network Layer 7-47
E-BGP vs. I-BGP Question: How do BGP routes get propagated within AS?
E.g.: how does B.b learn about routes from AS-A and AS-B? Inject BGP routes into IGP? bad idea – IGPs don’t scale Preferred way of distributing externally learnt prefixes
within an AS:• Internal-BGP (I-BGP): full-mesh within AS
Our earlier discussion on BGP peering between different AS• Technically correct to call it External-BGP (E-BGP)
a
b
b
aaC
A
Bd
A.a
A.c
C.bB.a
cb
c
Network Layer 7-48
Configuring routing In your organization you have to install a new PC in a
server-farm. The PC is multi-homed on two LANs. What static routes do you need to configure on the PC for shortest-path routing to all destinations? Assume: The PC is not routing between LANs The PC is not running any routing protocols Pick any IP addresses for the router interfaces consistent with
the LAN subnets
LAN193.1.1.32/28
LAN202.1.1/24
LAN193.1.1.0/28
LAN193.1.1.16/28
ISPR1 R2
serverfarm
new PC
Network Layer 7-49
Configuring routing (contd.) Now suppose your organization gets a second link to the
ISP via a new router R3. Your PC now has 3 LAN interfaces, and your organization has two links to the Internet. Can you suggest ways of load-balancing traffic to/from your organization?
LAN193.1.1.32/28
LAN202.1.1/24
LAN202.1.2/24
LAN193.1.1.0/28
LAN193.1.1.16/28
ISPR1 R2
R3
serverfarm
new PC
Network Layer 7-50
Summary Hierarchical routing: intra-AS versus inter-AS Policy:
Inter-AS: admin wants control over how its traffic routed, who routes through its net.
Intra-AS: single admin, so no policy decisions needed
Scale: hierarchical routing saves table size, reduced update
traffic
Performance: Intra-AS: can focus on performance Inter-AS: policy dominates over performance
Network Layer 7-51
Summary (contd.) Principles of BGP operation
Path-vector Configuration driven Route attributes (AS-Path, MED, Local-Pref,
…) Policies dictate everything! How does a customer connect to ISP? Examples of single and multi-homing
Interaction between routing protocols How does it all fit?
Design examples Finished with IP routing - whew!