1 CSE 524: Lecture 7 Network layer (Part 2). 2 Administrative Homework #2 due Reminder –Research paper topic

1

CSE 524: Lecture 7

Network layer (Part 2)

2

Administrative

• Homework #2 due

• Reminder– Research paper topic

3

Network layer (part II)

• Last class– Network layer functions

• Addressing, security, fragmentation, delivery semantics, quality of service

• In the middle of talking about routing….

• This class and beyond– Finish network layer functions (routing)– Specific implementations

• IP (addressing, security, fragmentation, delivery semantics, quality of service, routing)

• Network layer devices and implementations

4

NL: DBF (count-to-infinity example)

Link cost changes:• good news travels fast • bad news travels slow - “count to infinity” problem!• see book for explanation of this example

X Z14

50

Y60

algorithmcontinues

on!

5

NL: DBF: (count-to-infinity example)

A

25

1

1

B

C

B

C 21

dest cost

A

C 11

dest cost

A

B 12

dest cost

X

6


A

25 1

B

C

B

C 21

dest cost

A

C 1~

dest cost

A

B 12

dest cost

C Sends Routes to B

7


A

25 1

B

C

B

C 21

dest cost

A

C 13

dest cost

A

B 12

dest cost

B Updates Distance to A

8


A

25 1

B

C

B

C 21

dest cost

A

C 13

dest cost

A

B 14

dest cost

B Sends Routes to C

9


A

25 1

B

C

B

C 21

dest cost

A

C 15

dest cost

A

B 14

dest cost

C Sends Routes to B

10

NL: How are loops caused?

• Observation 1:– B’s metric increases

• Observation 2:– C picks B as next hop to A– But, the implicit path from C to A includes itself!

11

NL: Solutions to looping

• Split horizon– Do not advertise route to X to an adjacent neighbor if your

route to X goes through that neighbor– If C routes through B to get to A, C does not advertise (C=>A)

route to B.

• Poisoned reverse– Advertise an infinite distance route to X to an adjacent

neighbor if your route to X goes through that neighbor– If C routes through B to get to A, C advertises to B that its

distance to A is infinity

• Works for two node loops– Does not work for loops with more nodes

12

NL: Split-horizon with poisoned reverse

If Z routes through Y to get to X :• Z tells Y its (Z’s) distance to X is infinite (so Y won’t

route to X via Z)• will this completely solve count to infinity problem?

X Z14

50

Y60

algorithmterminates

13


1

11

1

A

• Example Where Split Horizon Fails

• When link breaks, C marks D as unreachable and reports that to A and B

• Suppose A learns it first– A now thinks best path to D is

through B– A reports D unreachable to B and

a route of cost=3 to C

• C thinks D is reachable through A at cost 4 and reports that to B

• B reports a cost 5 to A who reports new cost to C

• etc...

X

B

C

D

14


• Path Holddown– If metric increases, delay propagating information– In our example, A and B delay changing and advertising new

routes– A and B both set route to D to infinity after single step– Delay too large

• Adversely affects convergence

– Delay too small• Count-to-infinity more probable

• Route poisoning– Advertise infinite cost when cost from next hop starts to

increase

15


• Path vector – Select loop-free paths– Each route advertisement carries entire path– If a router sees itself in path, it rejects the route– BGP does it this way– Space proportional to diameter

16


• Do solutions completely eliminate loops?– No! Transient loops are still possible– Why? Because implicit path information may be stale– See this in BGP convergence

• Only way to fix this– Ensure that you have up-to-date information by

explicitly querying

17

NL: Link State vs. Distance Vector

• Network bandwidth– DV

• send everything you know to your neighbors

• large messages, transfers only to neighbors

– LS• send info about your neighbors to everyone

• small messages, transfers O(nE)

18


• Convergence speed:– LS

• faster – don’t need to process LSPs before forwarding• single SPT calculation

– DV• fast with triggered updates• count-to-infinity problem

• Space requirements:– LS

• maintains entire topology

– DV • maintains only neighbor state• path vector maintains routes proportional to network diameter

19


• Robustness:– LS can broadcast incorrect/corrupted LSP

• Can be made robust since sources are aware of alternate paths

– DV can advertise incorrect paths to all destinations• Incorrect calculation can spread to entire network

20

NL: DUAL

• Distributed Update Algorithm – Garcia-Luna-Aceves 1989– Goal: Avoid transient loops in DV and LS algorithms– 2 ideas

• A path shorter than current path cannot contain a loop• Based on diffusing computation (Dijkstra-Scholten 1980)

– Wait until computation completes before changing routes in response to a new update

– Similar to path-holddown

– 3 kinds of messages• Update, query, reply

– 2 states for routers• Active (queries outstanding), passive

21

NL: DUALOn update if (lower cost) adoptelse if (higher cost) { if (from next hop) { if (any path exists < old length from next hop) switch path else freeze route send query to all neighbors except next hop go into active wait for reply from all neighbors update route return to passive } send reply to all querying neighbors}

22

NL: Routing Hierarchies

• Flat routing doesn’t scale– Each node/router cannot be expected to store routes to every

destination (or destination network)

– 50 million destinations

– Route table exchange would swamp network

• Key observation– Need less information with increasing distance to destination

• Two radically different approaches for routing– The area hierarchy

– The landmark hierarchy (discuss in routing alternatives)

23

NL: Areas

• Divide network into areas– Areas can have nested sub-areas– No path between two sub-areas of an area can exit that area– Within area, each node has routes to every other node – Outside area

• Each node has routes for other top-level areas only• Inter-area packets are routed to nearest appropriate border router• Can result in sub-optimal paths

• Hierarchically address nodes in a network– Sequentially number top-level areas– Sub-areas of area are labeled relative to that area– Nodes are numbered relative to the smallest containing area

24

NL: Landmark Hierarchy• Details about things nearby and less information about things far

away• Not defined by arbitrary boundaries

– Thus, not well suited to the real world that does have administrative boundaries

• Example: My apartment• MtHood.Portland.USBancorpTower.Oba.KearneyPlaza• From Beaverton

– Go towards Mt. Hood– See USBancorpTower before running into Mt.Hood – See Oba before running into USBancorpTower– Reach Oba and route to Kearney Plaza 2 blocks away

• From The Dalles– Go towards Mt. Hood, reach it– Go towards Portland, see USBancorpTower– Go towards and reach USBancorpTower– Go towards and reach Oba, route to Kearney Plaza 2 blocks away

25

NL: A Landmark

1

2

3

4

56

7

89

10

11

Router 1 is a landmarkof radius 2

26

NL: Landmark Overview

• Landmark routers have “height” which determines how far away they can be seen (visibility)– Routers within radius n can see a landmark router LMn

• See means that those routers have LMn’s address and know next hop to reach it. – Router x as an entry for router y if x is within radius of y

• Distance vector style routing with simple metric

• Routing table: Landmark (LM2(d)), Level(2), Next hop

27

NL: LM Hierarchy Definition

• Each LM i associated with level (i) and radius (ri)

• Every node is an LM0 landmark

• Recursion: some LMi are also LMi+1

– Every LMi sees at least one LMi+1

• Terminating state when all level j LMs are seen by entire network

28

NL: LM Self-configuration

• Bottom-up hierarchy construction algorithm– Goal to bound number of children– Every router is L0 landmark– All Li landmarks run election to self-promote one or more L i+1

landmarks

• LM level maps to radius (part of configuration), e.g.:– LM level 0: radius 2– LM level 1: radius 4– LM level 2: radius 8

• Dynamic algorithm to adapt to topology changes – Efficient hierarchy

29

NL: LM Addresses

• LM(2).LM(1).LM(0) (C.B.A)

• If destination is more than two hops away, will not have complete routing information, refer to LM(1) portion of address, if not known then refer to LM(2)

LM2C

LM1B

R2

R1

LM0A aka C.B.A

R0

30

NL: LM Routing

• LM does not imply hierarchical forwarding

• It is NOT a source route

• En route to LM1,packet may encounter router that is within LM0 radius of destination address (like longest match)

• Paths may be asymmetric

31

•Source wants to reach LM0[a], whose address is c.b.a:

•Source can see LM2[c], so sends packet towards c

•Entering LM1[b] area, first router diverts packet to b

•Entering LM0[a] area, packet delivered to a

•Not shortest path•Packet may not reach landmarks

NL: Landmark Routing: Basic Idea

LM2[c]

LM1[b]r0[a]

LM0[a]

r2[c]

r1[b]

Network Node

Path

Landmark Radius

32

NL: Landmark Routing: Example

d.d.a

d.d.b

d.d.c

d.d.e

d.d.d

d.d.f

d.i.kd.i.g

d.d.j

d.i.i

d.i.w

d.i.ud.d.kd.d.l

d.n.hd.n.x

d.n.n

d.n.o

d.n.p

d.n.q

d.n.t

d.n.s

d.n.r

d.i.v

33

NL: Routing Table for Router g

Landmark Level Next hop

LM2[d]

LM0[e]

LM1[i]

LM0[k]

LM0[f]

2

1

0

0

0

f

k

f

k

f

Router g

Router t

r0 = 2, r1 = 4, r2 = 8 hops • How to go from d.i.g to d.n.t? g-f-e-d-u-t

• How does path length compare to shortest path? g-k-I-u-t

d.d.a

d.d.b

d.d.c

d.d.e

d.d.d

d.d.f

d.i.kd.i.g

d.d.j

d.i.i

d.i.w

d.i.ud.d.kd.d.l

d.n.hd.n.x

d.n.n

d.n.o

d.n.p

d.n.q

d.n.t

d.n.s

d.n.r

34

NL: Demux to upper layer

• Sends payload to the next layer in protocol stack (usually transport layer)

35

NL: Error detection

• Protection of data and/or header at the network layer– Provide extra protection on top of data-link layer and

below transport layer– End-to-end principle

• Is this necessary?

36

NL: Network layer functions summary

• Each networking layer provides varying degrees of functionality including….– security, delivery semantics, quality of service, demux to upper

layer, error detection, fragmentation, addressing, routing

• Next: IP network layer– IP error detection– IP demux to upper layer– IP security– IP fragmentation– IP delivery semantics– IP quality-of-service– IP addressing– IP routing

37

NL: The Internet Network layer (IP)

routingtable

Host, router network layer functions:

Routing protocols•path selection•RIP, OSPF, BGP

IP protocol•addressing conventions•datagram format•packet handling conventions

ICMP protocol•error reporting•router “signaling”

Transport layer: TCP, UDP

Link layer

physical layer

Networklayer

38

NL: How is IP Design Standardized?

• IETF– Voluntary organization

– Meeting every 4 months

– Working groups and email discussions

• “We reject kings, presidents, and voting; we believe in rough consensus and running code” (Dave Clark 1992)– Need 2 independent, interoperable implementations for

standard

• IRTF– End2End

– Reliable Multicast, etc..

39

NL: IP datagram format (RFC 791)

ver length

32 bits

data (variable length,typically a TCP

or UDP segment)

16-bit identifier

header checksum

time tolive

32 bit source IP address

IP protocol versionnumber (currently 4)

header length (bytes)

max numberremaining hops

(decremented at each router)

forfragmentation/reassembly

total datagramlength (bytes)

upper layer protocolto deliver payload to

head.len

type ofservice

“type” of data flgs

fragment offset

upper layer

32 bit destination IP address

Options (if any) + padding E.g. timestamp,record routetaken, pecifylist of routers to visit.

40

NL: IP header• Version

– Currently at 4, next version 6

• Header length– Length of header (20 bytes plus options)

• Type of Service– Typically ignored– Values

• 3 bits of precedence• 1 bit of delay requirements• 1 bit of throughput requirements• 1 bit of reliability requirements

– Replaced by DiffServ and ECN

• Length– Length of IP fragment (payload)

41

NL: IP header (cont)

• Identification – To match up with other fragments

• Flags– Don’t fragment flag– More fragments flag

• Fragment offset– Where this fragment lies in entire IP datagram– Measured in 8 octet units (11 bit field)

42

NL: IP header (cont)

• Time to live– Ensure packets exit the network

• Protocol– Demultiplexing to higher layer protocols

• Header checksum– Ensures some degree of header integrity– Relatively weak – 16 bit

• Source IP, Destination IP (32 bit addresses)• Options

– E.g. Source routing, record route, etc.– Performance issues

• Poorly supported

43

NL: IP error detection

• IP checksum– IP has a header checksum, leaves data integrity to

TCP/UDP– Catch errors within router or bridge that are not

detected by link layer– Incrementally updated as routers change fields– http://www.rfc-editor.org/rfc/rfc1141.txt

44

NL: IP demux to upper layer

• http://www.rfc-editor.org/rfc/rfc1700.txt– Protocol type field

• 1 = ICMP• 2 = IGMP• 3 = GGP• 4 = IP in IP• 6 = TCP• 8 = EGP• 9 = IGP • 17 = UDP• 29 = ISO-TP4 • 80 = ISO-IP• 88 = IGRP• 89 = OSPFIGP• 94 = IPIP http://www.rfc-editor.org/rfc/rfc2003.txt

http://www.erg.abdn.ac.uk/users/gorry/course/inet-pages/icmp.html

http://www.erg.abdn.ac.uk/users/gorry/course/inet-pages/tcp.html

http://www.erg.abdn.ac.uk/users/gorry/course/inet-pages/udp.html

45

NL: IP demux to upper layer

• ICMP: Internet Control Message Protocol– Essentially a network-layer protocol

for passing control messages– used by hosts, routers, gateways to

communicate network-level information

• error reporting: unreachable host, network, port, protocol

• echo request/reply (used by ping)– network-layer “above” IP:

• ICMP msgs carried in IP datagrams

– ICMP message: type, code plus first 8 bytes of IP datagram causing error

• http://www.rfc-editor.org/rfc/rfc792.txt

Type Code description0 0 echo reply (ping)3 0 dest. network unreachable3 1 dest host unreachable3 2 dest protocol unreachable3 3 dest port unreachable3 6 dest network unknown3 7 dest host unknown4 0 source quench (congestion control - not used)8 0 echo request (ping)9 0 route advertisement10 0 router discovery11 0 TTL expired12 0 bad IP header

46

NL: IP security

• IP originally had no provisions for security

• IPsec– Retrofit IP network layer with encryption and

authentication– http://www.rfc-editor.org/rfc/rfc2411.txt– If time permits, we may cover this at the end of the

course….or someone should do a research paper on this.

http://www.rfc-editor.org/rfc/rfc2411.txt

47

NL: IP fragmentation (and reassembly)

• network links have MTU (max.transfer size) - largest possible link-level frame.– different link types, different

MTUs

• IP packets can be 64KB– large IP datagram divided

(“fragmented”) within net– IP header on each fragment– Intermediate router may

fragment further as needed– one datagram becomes several

datagrams– “reassembled” only at final

destination– IP header bits used to identify,

order related fragments

fragmentation: in: one large datagramout: 3 smaller datagrams

reassembly

48

NL: IP fragmentation

ID=x

offset=0

fragflag=0

length=4000

ID=x

offset=0

fragflag=1

length=1500

ID=x

offset=1480

fragflag=1

length=1500

ID=x

offset=2960

fragflag=0

length=1040

One large datagram becomesseveral smaller datagrams

49

NL: IP fragmentation

• Path MTU Discovery in IP – http://www.rfc-editor.org/rfc/rfc1191.txt– Hosts dynamically discover minimum MTU of path – Algorithm:

• Initialize MTU to MTU for first hop• Send datagrams with Don’t Fragment bit set• If ICMP “pkt too big” msg, decrease MTU

– What happens if path changes?• Periodically (>5mins, or >1min after previous increase), increase MTU

– Some routers will return proper MTU– MTU values cached in routing table– C. Shannon, D. Moore, k claffy, “Characteristics of Fragmented

IP Traffic on Internet Links” ACM SIGCOMM Internet Measurement Workshop 2001.

50

NL: IP delivery semantics

• The waist of the hourglass– Unreliable datagram service– Out-of-order delivery possible– Compare to ATM and phone network…

• Unicast mostly– IP broadcast not forwarded– IP multicast supported, but not widely used

• If there is time, we will talk about IP multicast….

51

NL: IP quality of service

• IP originally had “type-of-service” (TOS) field to eventually support quality– Not used, ignored by most routers

• Then came int-serv (integrated services) and RSVP signalling– Per-flow quality of service through end-to-end

support• Setup and match flows on connection ID• Per-flow signaling• Per-flow network resource allocation (*FQ, *RR

scheduling algorithms)

52


• RSVP– http://www.rfc-editor.org/rfc/rfc2205.txt– Provides end-to-end signaling to network elements– General purpose protocol for signaling information– Not used now on a per-flow basis to support int-serv, but being

reused for diff-serv.

• int-serv– Defines service model (guaranteed, controlled-load)

• http://www.rfc-editor.org/rfc/rfc2210.txt• http://www.rfc-editor.org/rfc/rfc2211.txt• http://www.rfc-editor.org/rfc/rfc2212.txt

– Dozens of scheduling algorithms to support these services• WFQ, W2FQ, STFQ, Virtual Clock, DRR, etc.• If this class was being given 5 years ago….

53


• Why did RSVP, int-serv fail?– Complexity

• Scheduling• Routing• Per-flow signaling overhead

– Lack of scalability• Per-flow state• Route pinning

– Economics• Providers with no incentive to deploy• SLA, end-to-end billing issues

– QoS a weak-link property• Requires every device on an end-to-end basis to support flow

54


• Now it’s diff-serv…– Use the “type-of-service” bits as a priority marking– http://www.rfc-editor.org/rfc/rfc2474.txt– http://www.rfc-editor.org/rfc/rfc2475.txt– http://www.rfc-editor.org/rfc/rfc2597.txt– http://www.rfc-editor.org/rfc/rfc2598.txt– Core network relatively stateless– AF

• Assured forwarding (drop precedence)

– EF• Expedited forwarding (strict priority handling)

– If there is time, we may cover IP quality of service more completely at the end of the class….

55

NL: IP Addressing

• IP address: fixed-length, 32-bit identifier for host, router interface

• interface: connection between host, router and physical link– router’s typically have

multiple interfaces

– host may have multiple interfaces

– IP addresses associated with interface, not host, router

223.1.1.1

223.1.1.2

223.1.1.3

223.1.1.4 223.1.2.9

223.1.2.2

223.1.2.1

223.1.3.2223.1.3.1

223.1.3.27

223.1.1.1 = 11011111 00000001 00000001 00000001

223 1 11

56

NL: IP Addressing

• IP address: – network part (high order

bits)– host part (low order bits)

• What’s a network ? (from IP address perspective)– device interfaces with

same network part of IP address

– can physically reach each other without intervening router

223.1.1.1

223.1.1.2

223.1.1.3

223.1.1.4 223.1.2.9

223.1.2.2

223.1.2.1

223.1.3.2223.1.3.1

223.1.3.27

network consisting of 3 IP networks(for IP addresses starting with 223, first 24 bits are network address)

LAN

57

NL: IP Addressing

How to find the networks?

• Detach each interface from router, host

• create “islands of isolated networks

223.1.1.1

223.1.1.3

223.1.1.4

223.1.2.2223.1.2.1

223.1.2.6

223.1.3.2223.1.3.1

223.1.3.27

223.1.1.2

223.1.7.0

223.1.7.1223.1.8.0223.1.8.1

223.1.9.1

223.1.9.2

Interconnected system consisting

of six networks

58

NL: Initial IP Addressing (1981)

• Classful structure (1981)• Total IP address size: 4 billion

– Class A: 128 networks, 16M hosts

– Class B: 16K networks, 64K hosts

– Class C: 2M networks, 256 hosts

High Order Bits0 10 110

Format7 bits of net, 24 bits of host14 bits of net, 16 bits of host21 bits of net, 8 bits of host

ClassABC

59

NL: IP address classes

Network ID Host ID

8 16

Class A32

0

Class B 10

Class C 110

Multicast AddressesClass D 1110

Reserved for experimentsClass E 1111

24

Network ID

Network ID

Host ID

Host ID

1.0.0.0 to 127.255.255.255

128.0.0.0 to 191.255.255.255

192.0.0.0 to 223.255.255.255

224.0.0.0 to 239.255.255.255

60

NL: Special IP Addresses• 127.0.0.1: local host (a.k.a. the loopback address)• Private addresses

– http://www.rfc-editor.org/rfc/rfc1918.txt– Class A: 10.0.0.0 - 10.255.255.255 (10/8 prefix)– Class B: 172.16.0.0 - 172.31.255.255 (172.16/12 prefix)– Class C: 192.168.0.0 - 192.168.255.255 (192.168/16 prefix)

• 255.255.255.255– IP broadcast to local hardware that must not be forwarded – http://www.rfc-editor.org/rfc/rfc919.txt– Same as network broadcast if no subnetting

• IP of network broadcast=NetworkID+(all 1’s for HostID)

• 0.0.0.0– IP address of unassigned host (BOOTP, ARP, DHCP)– Default route advertisement

61

NL: IP Addressing Problem #1 (1984)

• Subnet addressing– http://www.rfc-editor.org/rfc/rfc917.txt– Address problem of inefficient use of address space– For class B & C networks

• Class A (rarely given out, not many of them given out by IANA)• Class B = 64k hosts

– Very few LANs have close to 64K hosts– Electrical/LAN limitations, performance or administrative reasons – e.g., class B net allocated enough addresses for 64K hosts, even if only 2K

hosts in that network

• Class C = 256 hosts

– Need simple way to get multiple “networks”• Use bridging, multiple IP networks or split up single network address

ranges (subnet)• Reduce the total number of addresses that are assigned, but not used

62

NL: Subnetting

• Variable length subnet masks – Subnet a class B address space into several chunks

Network Host

Network HostSubnet

1111.. 00000000..1111 Mask

63

NL: Subnetting Example

• Assume an organization was assigned address 150.100

• Assume < 100 hosts per subnet

• How many host bits do we need?– Seven

• What is the network mask?– 11111111 11111111 11111111 10000000– 255.255.255.128

64

NL: IP Address Problem #2 (1991)

• Address space depletion– In danger of running out of classes A and B– Class A

• Very few in number

• IANA frugal in giving them out

– Class B• Subnetting only applied to new allocations

• sparsely populated

• people refuse to give it back

– Class C too small for most domains

65

NL: Some Problems

• Solution– Assign multiple consecutive class C address blocks– Supernetting


– Later known as Classless Inter-Domain Routing (CIDR)



66

NL: IP addressing: CIDR

• Original classful addressing– Use class structure (A, B, C) to determine network ID for

route lookup

• CIDR: Classless InterDomain Routing– Do not use classes to determine network ID– network portion of address of arbitrary length– address format: a.b.c.d/x, where x is # bits in network

portion of address

11001000 00010111 00010000 00000000

networkpart

hostpart

200.23.16.0/23

67

NL: CIDR

• Assign any range of addresses to network– Use common part of address as network number– e.g., addresses 192.4.16.* to 192.4.31.* have the

first 20 bits in common. Thus, we use this as the network number

– netmask is /20, /xx is valid for almost any xx– 192.4.16.0/20

• Enables more efficient usage of address space (and router tables)

• More on how this impacts routing later….

68

NL: IP addressing: How are they allocated?

• Hosts (host portion): – From organization via static configuration or DHCP

• Network (network portion) – Organization gets of ISP’s assigned address space– ISP gets assigned address space from ICANN

ISP's block 11001000 00010111 00010000 00000000 200.23.16.0/20

Organization 0 11001000 00010111 00010000 00000000 200.23.16.0/23

Organization 1 11001000 00010111 00010010 00000000 200.23.18.0/23

Organization 2 11001000 00010111 00010100 00000000 200.23.20.0/23 ... ….. …. ….

Organization 7 11001000 00010111 00011110 00000000 200.23.30.0/23

69

NL: IP addressing

Q: How does an ISP get block of addresses?

A: ICANN: Internet Corporation for Assigned

Names and Numbers

– allocates addresses

– manages DNS

– assigns domain names, resolves disputes

Documents

1 CSE 524: Lecture 7 Network layer (Part 2). 2 Administrative Homework #2 due Reminder –Research paper topic