Upload
ferdinand-washington
View
230
Download
0
Tags:
Embed Size (px)
Citation preview
InternetworkingInternetworking
Interconnection of 2 or more networks Interconnection of 2 or more networks forming an internetwork, or internet.forming an internetwork, or internet.– LANs, MANs, and WANs.LANs, MANs, and WANs.
Different networks man different protocols.Different networks man different protocols.– TCP/IP, IBM’s SNA, DEC’s DECnet, ATM, TCP/IP, IBM’s SNA, DEC’s DECnet, ATM,
Novell and AppleTalk (for LANs).Novell and AppleTalk (for LANs).– Also, satellite and cellular networks.Also, satellite and cellular networks.
Example InternetExample Internet
B R X.25 WAN R
R
SNA WAN
802.5LAN
R802.3LAN
802.4LAN
802.3LAN
LAN-LANLAN-WAN
LAN-WAN-LAN
Gateway: device connecting 2 ormore different networks.
GatewaysGateways Repeaters: operate at physical layer (bits); Repeaters: operate at physical layer (bits);
amplify/regenerate signal.amplify/regenerate signal. Bridges: store-and-forward frames; data link layer Bridges: store-and-forward frames; data link layer
devices.devices. Routers: operate at network layer.Routers: operate at network layer. Transport gateways: connect networks at the Transport gateways: connect networks at the
transport layer.transport layer. Application gateways: connect 2 parts of an Application gateways: connect 2 parts of an
application at application layer.application at application layer.
How do networks differ?How do networks differ?
Service offered: connection-oriented versus connection-less.Service offered: connection-oriented versus connection-less. Protocols: IP, IPX, AppleTalk, DECnet.Protocols: IP, IPX, AppleTalk, DECnet. Addressing: flat (802) versus hierarchical (IP).Addressing: flat (802) versus hierarchical (IP). Maximum packet size.Maximum packet size. Quality of service.Quality of service. Error control: reliable, ordered, unordered delivery.Error control: reliable, ordered, unordered delivery. Flow control: sliding window versus rate-based.Flow control: sliding window versus rate-based. Congestion control: leaky bucket, choke packets.Congestion control: leaky bucket, choke packets. Security: privacy rules, encryption.Security: privacy rules, encryption. Parameters: different timeouts.Parameters: different timeouts.
Types of InternetworksTypes of Internetworks
Connection-oriented concatenation of VC Connection-oriented concatenation of VC subnets.subnets.– VC between source and router closest to destination VC between source and router closest to destination
network. network.
– Router builds V to gateway to other subnet.Router builds V to gateway to other subnet.
– Gateway keeps state about that VC.Gateway keeps state about that VC.
– Builds VC to router in the next subnet, etc.Builds VC to router in the next subnet, etc.
Every packet traverses same path.Every packet traverses same path.– Ordered delivery.Ordered delivery.
– Routers convert between packet formats.Routers convert between packet formats.
Connection-oriented Connection-oriented concatenationconcatenation
VC between source and router closest to VC between source and router closest to destination network. destination network.
Router builds VC to gateway to other Router builds VC to gateway to other subnet. Gateway keeps state about VC.subnet. Gateway keeps state about VC.
Gateway builds VC to router in the next Gateway builds VC to router in the next subnet, etc.subnet, etc.
Every packet traverses same path.Every packet traverses same path.– Ordered delivery.Ordered delivery.– Routers convert between packet formats.Routers convert between packet formats.
Connectionless InternetworkingConnectionless Internetworking
Datagram model.Datagram model.– Different packets may take different routes.Different packets may take different routes.– Separate routing decision for each packet.Separate routing decision for each packet.– No ordered delivery guarantees.No ordered delivery guarantees.
Datagram versus VC InternetsDatagram versus VC Internets VC:VC:
– Plus’s: resources reserved in advance, ordered Plus’s: resources reserved in advance, ordered delivery, short headers.delivery, short headers.
– Minus’s: vulnerability to failures, less adaptive, Minus’s: vulnerability to failures, less adaptive, hard if involving datagram subnet.hard if involving datagram subnet.
Datagram:Datagram:– Plus’s: more robust and adaptive, can be used over Plus’s: more robust and adaptive, can be used over
datagram subnets (many LANs, mobile networks).datagram subnets (many LANs, mobile networks).– Minus’s: Longer headers, unordered delivery.Minus’s: Longer headers, unordered delivery.
TunnelingTunneling
Interconnecting through a “foreign” subnet.Interconnecting through a “foreign” subnet.
G G
WAN
Ethernet 1Ethernet 2
Tunnel
IP
Ethernet frame
IP
Ethernet frameIP
IP packet insidepayload field ofWAN packet.
Internetwork Routing 1Internetwork Routing 1 2-level hierarchy:2-level hierarchy:
– Routing within each network: interior gateway protocol.Routing within each network: interior gateway protocol.
– Routing between networks: exterior gateway protocol.Routing between networks: exterior gateway protocol.
Within each network, different routing algorithms Within each network, different routing algorithms can be used.can be used.
Each network is autonomously managed and Each network is autonomously managed and independent of others: autonomous system (AS).independent of others: autonomous system (AS).
Internetwork Routing 2Internetwork Routing 2
Typically, packet starts in its LAN. Typically, packet starts in its LAN. Gateway receives it (broadcast on LAN to Gateway receives it (broadcast on LAN to “unknown” destination).“unknown” destination).
Gateway sends packet to gateway on the Gateway sends packet to gateway on the destination network using its routing table. destination network using its routing table. If it can use the packet’s native protocol, If it can use the packet’s native protocol, sends packet directly. Otherwise, tunnels it.sends packet directly. Otherwise, tunnels it.
Fragmentation 1Fragmentation 1
Network-specific maximum packet size.Network-specific maximum packet size.– Width of TDM slot.Width of TDM slot.– OS buffer limitations.OS buffer limitations.– Protocol (number of bits in packet length field).Protocol (number of bits in packet length field).
Maximum payloads range from 48 bytes Maximum payloads range from 48 bytes (ATM cells) to 64Kbytes (IP packets).(ATM cells) to 64Kbytes (IP packets).
Fragmentation 2Fragmentation 2
What happens when large packet wants to travel What happens when large packet wants to travel through network with smaller maximum packet size? through network with smaller maximum packet size? FragmentationFragmentation..
Gateways break packets into Gateways break packets into fragmentsfragments; each sent as ; each sent as separate packet.separate packet.
Gateway on the other side have to reassemble Gateway on the other side have to reassemble fragments into original packet.fragments into original packet.
2 kinds of fragmentation: transparent and non-2 kinds of fragmentation: transparent and non-transparent.transparent.
Transparent Fragmentation Transparent Fragmentation
Small-packet network transparent to other subsequent Small-packet network transparent to other subsequent networks.networks.
Fragments of a packet addressed to the same exit Fragments of a packet addressed to the same exit gateway, where packet is reassembled.gateway, where packet is reassembled.– OK for concatenated VC internetworking.OK for concatenated VC internetworking.
Subsequent networks are not aware fragmentation Subsequent networks are not aware fragmentation occurred.occurred.
ATM networks (through special hardware) provide ATM networks (through special hardware) provide transparent fragmentation: segmentation.transparent fragmentation: segmentation.
Problems with Transparent Problems with Transparent Fragmentation Fragmentation
Exit gateway must know when it received all Exit gateway must know when it received all the pieces.the pieces.– Fragment counter or “end of packet” bit.Fragment counter or “end of packet” bit.
Some performance penalty but requiring all Some performance penalty but requiring all fragments to go through same gateway.fragments to go through same gateway.
May have to repeatedly fragment and May have to repeatedly fragment and reassemble through series of small-packet reassemble through series of small-packet networks.networks.
Non-Transparent FragmentationNon-Transparent Fragmentation
Only reassemble at destination host.Only reassemble at destination host.– Each fragment becomes a separate packet.Each fragment becomes a separate packet.– Thus routed independently.Thus routed independently.
Problems:Problems:– Hosts must reassemble.Hosts must reassemble.– Every fragment must carry header until it Every fragment must carry header until it
reaches destination host.reaches destination host.
Keeping Track of Fragments 1Keeping Track of Fragments 1
Fragments must be numbered so that original Fragments must be numbered so that original data stream can be reconstructed.data stream can be reconstructed.
Tree-structured numbering scheme:Tree-structured numbering scheme:– Packet 0 generates fragments 0.0, 0.1, 0.2, …Packet 0 generates fragments 0.0, 0.1, 0.2, …– If these fragments need to be fragmented later on, then If these fragments need to be fragmented later on, then
0.0.0, 0.0.1, …, 0.1.0, 0.1.1, …0.0.0, 0.0.1, …, 0.1.0, 0.1.1, …– But, too much overhead in terms of number of fields But, too much overhead in terms of number of fields
needed.needed.– Also, if fragments are lost, retransmissions can take Also, if fragments are lost, retransmissions can take
alternate routes and get fragmented differently.alternate routes and get fragmented differently.
Keeping Track of Fragments 2Keeping Track of Fragments 2
Another way is to define elementary fragment Another way is to define elementary fragment size that can pass through every network.size that can pass through every network.
When packet fragmented, all pieces equal to When packet fragmented, all pieces equal to elementary fragment size, except last one elementary fragment size, except last one (may be smaller).(may be smaller).
Packet may contain several fragments.Packet may contain several fragments.
Keeping Track of Fragments 3Keeping Track of Fragments 3
Header contains packet number, number of first Header contains packet number, number of first fragment in the packet, and last-fragment bit.fragment in the packet, and last-fragment bit.
27 0 1 A B C D E F G H I J
27 0 0 A B C D E F G H 27 8 1 I J
Packet numberNumber offirst fragment
Last-fragment bit
(a) Original packetwith 10 data bytes.
(b) Fragments after passing through network with maximum packet size = 8 bytes.
1 byte
The Internet Network LayerThe Internet Network Layer
The Internet as a collection on networks or The Internet as a collection on networks or autonomous systems (ASs).autonomous systems (ASs).
Hierarchical structure.Hierarchical structure.
USbackbone
Europeanbackbone
Regionalnetwork
National network
Transcontinentallinks
Transcontinentallinks
IP (Internet Protocol)IP (Internet Protocol)
Glues Internet together.Glues Internet together. Common network-layer protocol spoken by all Common network-layer protocol spoken by all
Internet participating networks.Internet participating networks. Best effort datagram service:Best effort datagram service:
– No reliability guarantees.No reliability guarantees.– No ordering guarantees.No ordering guarantees.
IPIP
Transport layer breaks data streams into Transport layer breaks data streams into datagrams; fragments transmitted over datagrams; fragments transmitted over Internet, possibly being fragmented.Internet, possibly being fragmented.
When all packet fragments arrive at When all packet fragments arrive at destination, reassembled by network layer destination, reassembled by network layer and delivered to transport layer at and delivered to transport layer at destination host.destination host.
IP VersionsIP Versions
IPv4: IP version 4.IPv4: IP version 4.– Current, predominant version.Current, predominant version.– 32-bit long addresses.32-bit long addresses.
IPv6: IP version 6 (aka, IPng).IPv6: IP version 6 (aka, IPng).– Evolution of IPv4.Evolution of IPv4.– Longer addresses (16-byte long).Longer addresses (16-byte long).
IP Datagram FormatIP Datagram Format
IP datagram consists of header and data (or IP datagram consists of header and data (or payload).payload).
Header:Header:– 20-byte fixed (mandatory) part.20-byte fixed (mandatory) part.– Variable length optional part.Variable length optional part.
IP HeaderIP Header
32 bits
Version Headerlength
Type of service Total length
Identification Fragment offsetD M
TTL Protocol Header checksum
Source address
Destination address
Options
U
IP Header Fields 1IP Header Fields 1
Version: which IP version datagram uses.Version: which IP version datagram uses. Header length: how long (in 32-bit words) is header; Header length: how long (in 32-bit words) is header;
minimum=5; maximum=15 (options=40 bytes).minimum=5; maximum=15 (options=40 bytes). Type of service: precedence (priority), 3 flags (delay, Type of service: precedence (priority), 3 flags (delay,
throughput, reliability). In practice, routers ignore throughput, reliability). In practice, routers ignore type of service.type of service.
Total length: length of total datagram, i.e., header + Total length: length of total datagram, i.e., header + data (max = 64Kbytes).data (max = 64Kbytes).
IP Header Fields 2IP Header Fields 2
Identification: which datagram fragment Identification: which datagram fragment belongs to.belongs to.
U: unused bit.U: unused bit. D: don’t fragment.D: don’t fragment. M: more fragments.M: more fragments. Fragment offset: position of fragment in Fragment offset: position of fragment in
datagram.datagram. TTL: datagram lifetime.TTL: datagram lifetime.
IP Header Fields 3IP Header Fields 3
Protocol: number of the transport protocol Protocol: number of the transport protocol that generated the datagram.that generated the datagram.
Header checksum: verifies header integrity; Header checksum: verifies header integrity; computed at each hop.computed at each hop.
Source and destination address: IP Source and destination address: IP addresses of source and destination.addresses of source and destination.
Options: way of extending the protocol. Options: way of extending the protocol.
AddressingAddressing
Required for packet delivery.Required for packet delivery.– Each network may use different addressing Each network may use different addressing
scheme.scheme.– Addresses must be unique.Addresses must be unique.
Flat addresses: physical addresses (e.g., Flat addresses: physical addresses (e.g., Ethernet address).Ethernet address).
Hierarchical addresses: use hierarchy Hierarchical addresses: use hierarchy scheme like postal addresses (e.g., IP).scheme like postal addresses (e.g., IP).
Address TypesAddress Types
Unicast: uniquely distinguishes a single Unicast: uniquely distinguishes a single node.node.
Multicast: shared by a group of nodes.Multicast: shared by a group of nodes. Broadcast: shared by all nodes.Broadcast: shared by all nodes.
IP AddressesIP Addresses
Every host and router on the Internet must Every host and router on the Internet must have an IP address.have an IP address.
2-level hierarchy:2-level hierarchy:– Network number.Network number.– Host number.Host number.
Notations:Notations:– Binary: Binary: 10000000 00000110 11110000 0000001110000000 00000110 11110000 00000011
– Dotted decimal: 128.6.240.3Dotted decimal: 128.6.240.3
IP Address Formats 1IP Address Formats 1
4 different classes:4 different classes:
0XXXXXXX
Network Host
10XXXXXX XXXXXXXX
110XXXXX XXXXXXXX XXXXXXXX
1110XXXX XXXXXXXX XXXXXXXX XXXXXXXX
Class A:128 nets.16M hosts/net.Class B:16K nets.64K hosts/net.Class C:2M nets.256 hosts/net.Class D: Multicast.
IP Address Formats 2IP Address Formats 2
Class A: 1~127.Class A: 1~127. Class B: 128~191.Class B: 128~191. Class C: 192~223.Class C: 192~223. Class D: 224~239.Class D: 224~239.
Multi-addressesMulti-addresses
A router usually has more than one IP A router usually has more than one IP address.address.
Multi-homed host: host with multiple Multi-homed host: host with multiple network interfaces each of which has network interfaces each of which has different IP address.different IP address.
80.0.0.0
236.240.128.0129.98.0.0
129.98.95.1236.240.128.3
80.0.0.8
Management and Scalability 1Management and Scalability 1
Network numbers assigned by single Network numbers assigned by single authority: NIC (network information authority: NIC (network information center).center).
All hosts in a network must have same All hosts in a network must have same network number.network number.
What if networks grow?What if networks grow?
Management and Scalability 2Management and Scalability 2
Example: company starts with 1 class C Example: company starts with 1 class C LAN, thus can connect up to 256 hosts.LAN, thus can connect up to 256 hosts.– It might grow to more than 256 hosts.It might grow to more than 256 hosts.– It might get more LANs.It might get more LANs.– For every new LAN, need new network number For every new LAN, need new network number
from NIC.from NIC.– Moving machines between LANs needs address Moving machines between LANs needs address
change.change.
Subnetting 1Subnetting 1
Split address space into several “internal” Split address space into several “internal” subnets.subnets.– Still act like single network to outside world.Still act like single network to outside world.
Example: Class B address.Example: Class B address.
10XXXXXX XXXXXXXX HHHHHHHH HHHHHHHH
10XXXXXX XXXXXXXX SSSSSSHH HHHHHHHH
Class B:16K nets.64K hosts/net
Class B withsubnetting: 62LANs, 1022 hosts each.
1st. subnet: 130.50.4.12nd. subnet: 130.50.8.1
Subnetting 2Subnetting 2 Routing: hierarchical.Routing: hierarchical.
– (network, -) entries: distant networks hosts.(network, -) entries: distant networks hosts.– (this network, host) entries: local hosts.(this network, host) entries: local hosts.– Routers only need to keep track of other networks and Routers only need to keep track of other networks and
local hosts.local hosts. With subnetting:With subnetting:
– (network, -) entries: distant networks hosts.(network, -) entries: distant networks hosts.– (this network, subnet, -).(this network, subnet, -).– (this network, this subnet, host).(this network, this subnet, host).– Adds extra hierarchical level => smaller RTs.Adds extra hierarchical level => smaller RTs.
Subnet MaskSubnet Mask
Used to compute the subnet number; i.e., gets Used to compute the subnet number; i.e., gets rid of the host number.rid of the host number.– Facilitates routing table look-up.Facilitates routing table look-up.– IP address AND subnet mask = subnet #IP address AND subnet mask = subnet #
Example:Example:
10XXXXXX XXXXXXXX SSSSSSHH HHHHHHHH
11111111 11111111 11111100 00000000Ex: 130.50.15.6 AND subnet mask = 130.50.12.0,which is subnet 3.
Internet Control ProtocolsInternet Control Protocols
IP carries data.IP carries data. There are other network layer protocols that There are other network layer protocols that
carry control information.carry control information. Example: ICMP, ARP, RARP, BOOTP.Example: ICMP, ARP, RARP, BOOTP.
ICMPICMP
Internet Control Message Protocol.Internet Control Message Protocol. Report specific events.Report specific events.
– Generated by routers.Generated by routers.– Encapsulated in IP packets.Encapsulated in IP packets.
ICMP MessagesICMP Messages
Destination unreachable Packet couldn’t be deliveredTime exceeded TTL field hit 0Parameter problem Invalid header fieldSource quench Choke packetsRedirect Route problemEcho request Check if destination is upEcho reply Destination respondsTimestamp request Same as echo request + TSTimestamp reply Same as echo reply + TS
Mapping IP to DLL AddressMapping IP to DLL Address
Internet applications refer to hosts by their IP Internet applications refer to hosts by their IP addresses; once packet gets to destination addresses; once packet gets to destination LAN, node needs to figure out the destination LAN, node needs to figure out the destination DLL address.DLL address.
One solution is to have configuration file.One solution is to have configuration file.– Hard to maintain/update.Hard to maintain/update.
Address Resolution Protocol (ARP):Address Resolution Protocol (ARP):– Run by every node to map IP to DLL address Run by every node to map IP to DLL address
(RFC 826).(RFC 826).
ARPARP
Advantage: Advantage: – Easy to administer, less human intervention.Easy to administer, less human intervention.– Example: 2 hosts on the same Ethernet want to Example: 2 hosts on the same Ethernet want to
communicate.communicate.» Host 1 must figure out host 2’s Ethernet address.Host 1 must figure out host 2’s Ethernet address.
» Host 1 broadcasts ARP packet on Ethernet asking for Host 1 broadcasts ARP packet on Ethernet asking for the Ethernet address of host 2.the Ethernet address of host 2.
» Host 2 receives the ARP request, and replies with its Host 2 receives the ARP request, and replies with its Ethernet address.Ethernet address.
ARP OptimizationsARP Optimizations
Caching of ARP replies.Caching of ARP replies.– Entries may have large TTLs.Entries may have large TTLs.
When sending ARP request, piggyback its When sending ARP request, piggyback its own IP-DLL address mapping.own IP-DLL address mapping.
Every machine broadcasts its mapping at Every machine broadcasts its mapping at boot time.boot time.– No response is expected.No response is expected.– Other machines cache that information.Other machines cache that information.
Proxy ARPProxy ARP
What if host 1 wants to send data to host 3 What if host 1 wants to send data to host 3 on a different LAN?on a different LAN?– Router connecting the 2 LANs can be Router connecting the 2 LANs can be
configured to respond to ARP requests for the configured to respond to ARP requests for the networks it interconnects: proxy arp.networks it interconnects: proxy arp.
– Another solution is for host 1 to recognize host Another solution is for host 1 to recognize host 3 is on remote network and use default LAN 3 is on remote network and use default LAN address that handles all remote traffic; that address that handles all remote traffic; that could be the router’s Ethernet address. could be the router’s Ethernet address.
RARPRARP
Reverse Address Resolution Protocol.Reverse Address Resolution Protocol. Given LAN address, what’s the IP address?Given LAN address, what’s the IP address? Usually for booting diskless workstation.Usually for booting diskless workstation.
– Gets the OS image from remote file server.Gets the OS image from remote file server.– Same image for all machines.Same image for all machines.– Machine broadcasts its LAN address.Machine broadcasts its LAN address.– Remote RARP server responds with machine’s IP Remote RARP server responds with machine’s IP
address.address.
BOOTPBOOTP
RARP broadcasts are not forwarded by RARP broadcasts are not forwarded by routers. routers.
Need RARP server on every network.Need RARP server on every network. BOOTP uses UDP messages that are BOOTP uses UDP messages that are
forwarded by routers.forwarded by routers.– Also provides additional information such as IP Also provides additional information such as IP
address of file server holding OS image, subnet address of file server holding OS image, subnet mask, etc.mask, etc.
Internet RoutingInternet Routing
IGPs and EGPsIGPs and EGPs– IGPs: routing within ASs.IGPs: routing within ASs.– EGPs: routing between ASs.EGPs: routing between ASs.
IGPsIGPs
Original Internet IGP was RIP.Original Internet IGP was RIP.– Distance vector.Distance vector.
– OK for small ASs but not efficient as ASs got larger. OK for small ASs but not efficient as ASs got larger.
New IGP: OSPF.New IGP: OSPF.– Open Shortest Path First.Open Shortest Path First.
– Became standard in 1990.Became standard in 1990.
– Link state algorithm.Link state algorithm.
– RIP is still running but OSPF is taking over.RIP is still running but OSPF is taking over.
OSPF 1OSPF 1
Design requirements:Design requirements:– Open implementation.Open implementation.
– Support for various distance metrics: delay, hops, etc.Support for various distance metrics: delay, hops, etc.
– Dynamic: automatically adapt to topology changes.Dynamic: automatically adapt to topology changes.
– QoS Routing: real-time versus other traffic using IP’s type QoS Routing: real-time versus other traffic using IP’s type of service field.of service field.
– Load balancing across multiple lines.Load balancing across multiple lines.
– Security and tunneling.Security and tunneling.
OSPF 2OSPF 2
Abstracts collection of networks, routers and Abstracts collection of networks, routers and lines into a directed graph where edges are lines into a directed graph where edges are assigned a cost proportional to the routing assigned a cost proportional to the routing metric.metric.
It then computes shortest path.It then computes shortest path. Hierarchical routing within ASs.Hierarchical routing within ASs.
– Areas: collection of contiguous networks.Areas: collection of contiguous networks.– Area 0: AS backbone; all areas connected to it.Area 0: AS backbone; all areas connected to it.
OSPF 3OSPF 3
Type of service routing:Type of service routing:– Uses different graphs labeled with different Uses different graphs labeled with different
metrics.metrics. Routing updates:Routing updates:
– Adjacent routersAdjacent routers exchange routing information. exchange routing information.– Adjacent routers are on different LANs.Adjacent routers are on different LANs.– Reliable link state updates with sequence #’s.Reliable link state updates with sequence #’s.
EGPsEGPs
Routing protocol between ASs.Routing protocol between ASs. Take policy into account.Take policy into account.
– An AS may not be willing to carry traffic An AS may not be willing to carry traffic originating and destined to foreign ASs.originating and destined to foreign ASs.
– Example: phone companies are willing to carry Example: phone companies are willing to carry traffic for their customers but not for others.traffic for their customers but not for others.
Routing Policy ExamplesRouting Policy Examples
No transit traffic through certain ASs.No transit traffic through certain ASs. Traffic source restricts ASs through which Traffic source restricts ASs through which
its traffic crosses.its traffic crosses. Same for destination.Same for destination.
BGP 1BGP 1
Border Gateway Protocol.Border Gateway Protocol. Policies are manually configured into BGP Policies are manually configured into BGP
routers.routers. BGP abstracts networks as a collection of BGP abstracts networks as a collection of
BGP routers and the their links.BGP routers and the their links. 2 BGP routers are connected if they share a 2 BGP routers are connected if they share a
common network.common network. BGP routers communicate reliably using TCP.BGP routers communicate reliably using TCP.
BGP 2BGP 2
3 types of networks:3 types of networks:– Stub networks: have a single connection in the Stub networks: have a single connection in the
BGP graph; cannot carry transit traffic.BGP graph; cannot carry transit traffic.– Multi-connected networks: have multiple Multi-connected networks: have multiple
connections but refuse to carry transit traffic.connections but refuse to carry transit traffic.– Transit networks: agree to carry transit (3rd. Transit networks: agree to carry transit (3rd.
party) traffic possibly with some restriction; party) traffic possibly with some restriction; e.g., backbones. e.g., backbones.
BGP 3BGP 3
BGP is a distance vector protocol.BGP is a distance vector protocol. Routing table entries keep whole path to Routing table entries keep whole path to
destination + distance.destination + distance. BGP routers can discard the paths containing BGP routers can discard the paths containing
itself: avoiding loops and counting to infinity.itself: avoiding loops and counting to infinity. Routers compute distance associated to a route Routers compute distance associated to a route
taking policy into account.taking policy into account.– If policy is violated, distance = infinity.If policy is violated, distance = infinity.
Internet MulticastingInternet Multicasting
IP supports multicasting using class D IP supports multicasting using class D addresses.addresses.– Each class D address identifies a group of Each class D address identifies a group of
hosts.hosts.– 28 bits define over 250 million groups.28 bits define over 250 million groups.
Best-effort delivery.Best-effort delivery.
Group MembershipGroup Membership
Hosts (single or multiple processes) may join Hosts (single or multiple processes) may join and leave group.and leave group.
Special, multicast routers perform multicast Special, multicast routers perform multicast routing and packet forwarding.routing and packet forwarding.– Hosts belonging to multicast groups periodically Hosts belonging to multicast groups periodically
send messages to the closest multicast router.send messages to the closest multicast router.– Multicast routers and hosts use IGMP (Internet Multicast routers and hosts use IGMP (Internet
Group Management Protocol) to exchange Group Management Protocol) to exchange membership information.membership information.
IP Multicast RoutingIP Multicast Routing Use spanning trees.Use spanning trees. Modified distance vector protocol using Modified distance vector protocol using
unicast routing information.unicast routing information.– Build one spanning tree per source, per group.Build one spanning tree per source, per group.– Or, one shared spanning tree per group.Or, one shared spanning tree per group.– Use pruning to remove parts of the tree that don’t Use pruning to remove parts of the tree that don’t
have any multicast group members.have any multicast group members.– Use tunneling to cross regions that are not Use tunneling to cross regions that are not
multicast capable.multicast capable.
Mobile IP 1Mobile IP 1
Support for mobile users.Support for mobile users.– ““Last hop” mobility.Last hop” mobility.
Problem: IP addressing scheme.Problem: IP addressing scheme.– Class+network number+host number.Class+network number+host number.– If host moves and attaches itself to foreign If host moves and attaches itself to foreign
network, packets destined to it will still go to its network, packets destined to it will still go to its home network.home network.
– Assigning hosts new IP address?Assigning hosts new IP address?» Too much hassle.Too much hassle.
Mobile IP 2Mobile IP 2
Solution:Solution:– Home agent: runs at the home network.Home agent: runs at the home network.– Foreign agent: runs at foreign network.Foreign agent: runs at foreign network.– When mobile host connects itself to foreign When mobile host connects itself to foreign
network, registers with foreign network’s network, registers with foreign network’s foreign agent.foreign agent.
– Foreign agent assigns host Foreign agent assigns host care-of addresscare-of address, and , and informs home agent.informs home agent.
Mobile IP 3Mobile IP 3
Sending packets: mobile host uses its care-of Sending packets: mobile host uses its care-of address.address.
Receiving packets: Receiving packets: – When packet arrives at home network, router that gets it When packet arrives at home network, router that gets it
sends ARP request for that IP address.sends ARP request for that IP address.– Home agent replies with its own Ethernet address. It gets Home agent replies with its own Ethernet address. It gets
the packet, and tunnels it to foreign agent. Foreign agent the packet, and tunnels it to foreign agent. Foreign agent delivers packet to mobile host.delivers packet to mobile host.
– Home agent sends care-of address to sender, so future Home agent sends care-of address to sender, so future packets are sent directly to foreign network.packets are sent directly to foreign network.
Mobile IP 4Mobile IP 4
Locating foreign agents:Locating foreign agents:– Foreign agents periodically broadcast their address and Foreign agents periodically broadcast their address and
service provided (e.g., home, foreign, or both).service provided (e.g., home, foreign, or both).– Mobile host can announce its presence and wait for Mobile host can announce its presence and wait for
response from foreign agent.response from foreign agent.
Unregistration:Unregistration:– If host leaves without unregistering, its registration expires If host leaves without unregistering, its registration expires
after some time.after some time.
Security:Security:– Authentication issues.Authentication issues.
Scaling IP Addresses 1Scaling IP Addresses 1
Exponential growth of the Internet!Exponential growth of the Internet!– 32-bit address fields are getting too small.32-bit address fields are getting too small.– Early predictions: it’d take decades to achieve Early predictions: it’d take decades to achieve
100,000 network mark.100,000 network mark.– 100,000th. network was connected in 1996!100,000th. network was connected in 1996!– Internet is rapidly running out of IP addresses!Internet is rapidly running out of IP addresses!– Waste due to hierarchical address. Waste due to hierarchical address.
IP Address Formats IP Address Formats
4 different classes:4 different classes:
0XXXXXXX
Network Host
10XXXXXX XXXXXXXX
110XXXXX XXXXXXXX XXXXXXXX
1110XXXX XXXXXXXX XXXXXXXX XXXXXXXX
Class A:128 nets.16M hosts/net.Class B:16K nets.64K hosts/net.Class C:2M nets.256 hosts/net.Class D: Multicast.
Scaling IP Addresses 2 Scaling IP Addresses 2
Class A addresses: 16M hosts is usually too Class A addresses: 16M hosts is usually too much.much.
Class C addresses: 254 hosts is usually too small.Class C addresses: 254 hosts is usually too small. Class B addresses provide room for 64K hosts.Class B addresses provide room for 64K hosts.
– Organizations usually request class B addresses but Organizations usually request class B addresses but more than 50% of them only have up to 50 hosts!more than 50% of them only have up to 50 hosts!
Scaling IP Addresses 3Scaling IP Addresses 3
Class C addresses should have 10-bit host Class C addresses should have 10-bit host numbers instead of only 8-bit numbers.numbers instead of only 8-bit numbers.– Would allow for 1022 hosts instead of just 254.Would allow for 1022 hosts instead of just 254.– More Class C networks: network number can More Class C networks: network number can
grow up to 0.5M.grow up to 0.5M. But, could result in routing table explosion.But, could result in routing table explosion.
– Routers will have to know about many more Routers will have to know about many more networks.networks.
CIDR 1CIDR 1
Classless Interdomain Routing: RFC 1519.Classless Interdomain Routing: RFC 1519. No longer uses classes A, B, and C addresses.No longer uses classes A, B, and C addresses. Allocate remaining Class C addresses in Allocate remaining Class C addresses in
variable-sized blocks.variable-sized blocks.– Example: if an organization needs 2000 addresses, Example: if an organization needs 2000 addresses,
it’s given a block of 2048 addresses, or 8 it’s given a block of 2048 addresses, or 8 contiguous class C networks and not a full class B contiguous class C networks and not a full class B address.address.
CIDR 2CIDR 2
New allocation rules for class C addresses.New allocation rules for class C addresses. World partitioned into 4 zones and each one was World partitioned into 4 zones and each one was
given portion of class C address space (192~223).given portion of class C address space (192~223).– 192.0.0.0~195.255.255.255: Europe.192.0.0.0~195.255.255.255: Europe.
– 198.0.0.0~199.255.255.255: North America.198.0.0.0~199.255.255.255: North America.
– 200.0.0.0~201.255.255.255: Central and South America.200.0.0.0~201.255.255.255: Central and South America.
– 202.0.0.0~203.255.255: Asia and Pacific.202.0.0.0~203.255.255: Asia and Pacific.
CIDR 3CIDR 3
Each region is allocated ~ 32M class C Each region is allocated ~ 32M class C addresses.addresses.
Addresses 204.0.0.0~223.255.255.255 Addresses 204.0.0.0~223.255.255.255 reserved for future use.reserved for future use.
Advantages:Advantages:– Less waste.Less waste.– Routers can keep only one RT entry per region, Routers can keep only one RT entry per region,
i.e., 32M addresses compressed into one.i.e., 32M addresses compressed into one.
CIDR 4CIDR 4
Once packet gets to its destination region, Once packet gets to its destination region, need more detailed routing information.need more detailed routing information.
One possibility is to keep 131,072 (32M/2One possibility is to keep 131,072 (32M/288) ) entries for all “local” networks.entries for all “local” networks.– Explosion problem.Explosion problem.
Instead, use of 32-bit masks: only need to Instead, use of 32-bit masks: only need to keep start address of block.keep start address of block.
CIDR - Example 1CIDR - Example 1
Cambridge University has 2048 addresses from Cambridge University has 2048 addresses from 194.24.0.0~194.24.7.255 and mask 255.255.248.0.194.24.0.0~194.24.7.255 and mask 255.255.248.0.
Oxford University: 4096 addresses Oxford University: 4096 addresses 194.24.16.0~194.24.31.255 with mask 194.24.16.0~194.24.31.255 with mask 255.255.240.0.255.255.240.0.
U of Edinburgh: 1024 addresses U of Edinburgh: 1024 addresses 194.24.8.0~194.24.11.255 and mask 255.255.252.0.194.24.8.0~194.24.11.255 and mask 255.255.252.0.
IP EvolutionIP Evolution
CIDR bought IPv4 a few more years.CIDR bought IPv4 a few more years. Because of its addressing limitations and to Because of its addressing limitations and to
accommodate next-generation Internet accommodate next-generation Internet applications, IP must evolve.applications, IP must evolve.
In 1990, IETF started work on IP next In 1990, IETF started work on IP next generation, or IPng.generation, or IPng.– Several proposals were considered.Several proposals were considered.– SIPP (Simple Internet Protocol Plus) was selected SIPP (Simple Internet Protocol Plus) was selected
and became IPv6.and became IPv6.
IPv6 1IPv6 1
RFCs 1883~1887.RFCs 1883~1887. Features:Features:
– Longer addresses (16 bytes versus only 4 in IPv4).Longer addresses (16 bytes versus only 4 in IPv4).– Header simplification (only 7 fields versus 13 Header simplification (only 7 fields versus 13
fields in IPv4): faster processing by routers.fields in IPv4): faster processing by routers.– Better option support since fields that were Better option support since fields that were
previously required are now optional.previously required are now optional.– Improved security and QoS support.Improved security and QoS support.
IPv6 HeaderIPv6 Header
32 bits
Version Priority Flow label
Next header Hop limitPayload length
Source address(16 bytes)
Destination address(16 bytes)
IPv6 Header Fields 1IPv6 Header Fields 1 Version = 6.Version = 6.
– During transition period, routers will examine this field to During transition period, routers will examine this field to decide what kind of packet it is.decide what kind of packet it is.
Priority: handling different kinds of traffic. Priority: handling different kinds of traffic. – 0~7: data that can be flow controlled, e.g., data distribution 0~7: data that can be flow controlled, e.g., data distribution
services.services.
– 8~15: real-time traffic (e.g., audio, video)8~15: real-time traffic (e.g., audio, video)
– Within each group, lower values have lower priority than Within each group, lower values have lower priority than higher values (e.g., 1 for news, 4 for ftp and 6 for telnet)higher values (e.g., 1 for news, 4 for ftp and 6 for telnet)
IPv6 Header Fields 2IPv6 Header Fields 2
Flow label (experimental): allows source and Flow label (experimental): allows source and destination to set up pseudo-connection.destination to set up pseudo-connection.– Try to have some kind of service guarantees.Try to have some kind of service guarantees.– Example: assign flow number to a stream of Example: assign flow number to a stream of
packets that need reserved bandwidth.packets that need reserved bandwidth.– Flow number: src+dst+flow #.Flow number: src+dst+flow #.
Payload length: length of data.Payload length: length of data.– Different from IPv4 which specified total length Different from IPv4 which specified total length
of datagram.of datagram.
IPv6 Header Fields 3IPv6 Header Fields 3
Next header: specifies what is present in the Next header: specifies what is present in the options field (extension headers).options field (extension headers).
Hop limit: equivalent to IPv4’s TTL.Hop limit: equivalent to IPv4’s TTL. Source and destination addresses:Source and destination addresses:
– 16-byte addresses (fixed length).16-byte addresses (fixed length).– Address space is divided by using prefixes.Address space is divided by using prefixes.
IPv6 versus IPv4IPv6 versus IPv4
No more IHL (header length); why?No more IHL (header length); why? No more No more protocolprotocol field: field: next headernext header field. field. No more fragmentation-related fields.No more fragmentation-related fields.
– All IPv6 hosts and routers must support 576-byte packets.All IPv6 hosts and routers must support 576-byte packets.
– Fragmentation is less likely to occur.Fragmentation is less likely to occur.
– Router sends error messages back to source when packet is Router sends error messages back to source when packet is too big so source breaks it down.too big so source breaks it down.
No more checksum: rely on more reliable networks No more checksum: rely on more reliable networks and DLL and transport checksums.and DLL and transport checksums.
IPv6 Addressing 1IPv6 Addressing 1
Separate prefixes for provider-based and geographic-Separate prefixes for provider-based and geographic-based addresses.based addresses.– Ability to accommodate 2 ways of address assignment: Ability to accommodate 2 ways of address assignment:
» Addresses allocated to ISP companies.Addresses allocated to ISP companies. Prefix 010.Prefix 010. Each ISP assigned portion of address space.Each ISP assigned portion of address space. First 5 bits following prefix defines registry where provider is First 5 bits following prefix defines registry where provider is
registered.registered. Remaining 15 bytes are allocated by each provider.Remaining 15 bytes are allocated by each provider. Example: 3-byte provider number.Example: 3-byte provider number.
IPv6 Addressing 2IPv6 Addressing 2 Geographic-based addresses:Geographic-based addresses:
– Prefix 100.Prefix 100.– Same model as current Internet.Same model as current Internet.
Multicast addresses:Multicast addresses:– Prefix 11111111.Prefix 11111111.– 4-bit flag + 4-bit scope fields + 112-bit group id.4-bit flag + 4-bit scope fields + 112-bit group id.– Flags: 1 bit defines whether group is permanent or Flags: 1 bit defines whether group is permanent or
not.not.– Scope: limit reach of multicast packet.Scope: limit reach of multicast packet.
IPv6 Address NotationIPv6 Address Notation
8 groups of 4 hexadecimal digits separated 8 groups of 4 hexadecimal digits separated by colons.by colons.– Example: Example:
8000:0000:0000:0000:0123:4567:89AB:CDEF8000:0000:0000:0000:0123:4567:89AB:CDEF– Optimizations:Optimizations:
» Leading zeros within group can be omitted.Leading zeros within group can be omitted.
» Groups of zeros can be replaced by pair of colons.Groups of zeros can be replaced by pair of colons. 8000::123:4567:89AB:CDEF.8000::123:4567:89AB:CDEF.
» IPv4 addresses: ::192.31.20.46.IPv4 addresses: ::192.31.20.46.
Extension Headers 1Extension Headers 1
Equivalent to IPv4 options.Equivalent to IPv4 options. 6 types of extension headers:6 types of extension headers:
Hop-by-hop optionsHop-by-hop options Misc. info for routersMisc. info for routers
RoutingRouting Full or partial route includedFull or partial route included
FragmentationFragmentation Management of fragmentsManagement of fragments
AuthenticationAuthentication Verification of source’s idVerification of source’s id
Encrypted payloadEncrypted payload Information about encryptionInformation about encryption
Destination optionsDestination options Information for destinationInformation for destination
Extension Headers 2Extension Headers 2
Fixed format and variable-sized headers.Fixed format and variable-sized headers. Variable-sized headers:Variable-sized headers:
– (type, length, value).(type, length, value).
– Type: 1 byte specifying which option this is.Type: 1 byte specifying which option this is.» First 2 bits tell option-uncapable routers what to do: skip option, First 2 bits tell option-uncapable routers what to do: skip option,
discard packet, discard packet with ICMP message, discard packet discard packet, discard packet with ICMP message, discard packet without ICMP packet for multicast addresses.without ICMP packet for multicast addresses.
– Length: how long value field (0~255 bytes).Length: how long value field (0~255 bytes).
– Value: information.Value: information.
Hop-by-Hop HeaderHop-by-Hop Header
Convey information all routers along path Convey information all routers along path must examine.must examine.– Jumbograms: datagrams > 64KBytes.Jumbograms: datagrams > 64KBytes.
– Next header: what option this is.Next header: what option this is.
– Length of hop-by-hop header excluding the first 8 Length of hop-by-hop header excluding the first 8 (mandatory) bytes.(mandatory) bytes.
– Defines option, in this case datagram size.Defines option, in this case datagram size.
Next Header 0 194 0
Jumbogram payload length
Routing HeaderRouting Header
Lists one or more routers that must be Lists one or more routers that must be visited on the way to the destination.visited on the way to the destination.– Strict source routing: full path is supplied.Strict source routing: full path is supplied.– Loose source routing: only selected routers are Loose source routing: only selected routers are
listed.listed.
Fragment HeaderFragment Header
Allows source to fragment datagram.Allows source to fragment datagram.– In IPv6, routers are not allowed to fragment.In IPv6, routers are not allowed to fragment.– If a router receives packet that is too big, it If a router receives packet that is too big, it
discards it and sends back a ICMP message to discards it and sends back a ICMP message to source.source.
– Source uses this option to fragment packet, and Source uses this option to fragment packet, and resend it.resend it.
– Contains datagram id, fragment number, and Contains datagram id, fragment number, and “last fragment” bit.“last fragment” bit.
Authentication HeaderAuthentication Header
Supports verification of sender’s identity.Supports verification of sender’s identity. Contains authentication key and Contains authentication key and
cryptographic checksum of the whole cryptographic checksum of the whole datagram.datagram.
Receiver uses key number to find secret Receiver uses key number to find secret key. Computes checksum using secret key key. Computes checksum using secret key and checks whether it matches with and checks whether it matches with received datagram.received datagram.
Destination OptionsDestination Options
Supports options that need only be Supports options that need only be interpreted by destination host.interpreted by destination host.
Quality of Service Quality of Service
Service offered by the network (carrier) to customer Service offered by the network (carrier) to customer (end user): service agreement.(end user): service agreement.
Service agreement: offered traffic, offered service, Service agreement: offered traffic, offered service, compliance requirements.compliance requirements.
If customer and carrier don’t agree: VC will not be If customer and carrier don’t agree: VC will not be set up.set up.
Different requirements for each direction.Different requirements for each direction.– E.g., VOD application: required bandwidth user->server E.g., VOD application: required bandwidth user->server
<> server->user.<> server->user.
Quality of Service Parameters 1Quality of Service Parameters 1
Peak cell rate PCR Max. cell transmission rateSustained cell rate SCR Average cell rateMinimum cell rate MCR Min. acceptable cell rateCell delay variation tolerance CDVT Max. acceptable cell jitterCell loss ratio CLR Fraction of lost cellsCell transfer delay CTD Time to deliverCell delay variation CDV Delivery delay variationCell error rate CER Fraction of correct cells
QoS Parameters 2QoS Parameters 2
PCR, SCR, MCR, and CVDT: specified by PCR, SCR, MCR, and CVDT: specified by sender.sender.
CLR, CTD, and CDV describe network CLR, CTD, and CDV describe network conditions and are measured at receiver.conditions and are measured at receiver.
The Transport LayerThe Transport Layer
End-to-end.End-to-end.– Communication from source to destination Communication from source to destination
host.host.– Only hosts run transport-level protocols.Only hosts run transport-level protocols.– Under user’s control as opposed to network Under user’s control as opposed to network
layer which is controlled/owned by carrier.layer which is controlled/owned by carrier.
The Transport ServiceThe Transport Service
Service provided to application layer.Service provided to application layer. Transport entity: process that implements Transport entity: process that implements
the transport protocol running on a host.the transport protocol running on a host.– At OS kernel, user-level process, or network At OS kernel, user-level process, or network
card.card.
The Transport LayerThe Transport Layer
TransportEntity
ApplicationLayer
Network Layer
Transportaddress
NetworkAddress
Transport/networkinterface
Application/transportinterface Transport
Entity
ApplicationLayer
Network Layer
TPDU
Source host Destination host
Types of Transport ServicesTypes of Transport Services
Connection-less versus connection-oriented.Connection-less versus connection-oriented. Connection-less service: no logical Connection-less service: no logical
connections, no flow or error control.connections, no flow or error control. Connection-oriented: Connection-oriented:
– Based on logical connections: connection setup, Based on logical connections: connection setup, data transfer, connection teardown.data transfer, connection teardown.
– Flow and error control.Flow and error control.
Transport versus NetworkTransport versus NetworkLayerLayer
Transport layer is “controlled” by user.Transport layer is “controlled” by user.– Ability to enhance network layer quality of Ability to enhance network layer quality of
service.service.– Example: transport service can be more reliable Example: transport service can be more reliable
than underlying network service.than underlying network service.– Transport layer makes standard set of Transport layer makes standard set of
primitives available to users which are primitives available to users which are independent from the network service independent from the network service primitives, which may vary considerably.primitives, which may vary considerably.
Quality of ServiceQuality of Service
User may specify QoS parameters at then User may specify QoS parameters at then transport layer.transport layer.– At connection setup time, user may define At connection setup time, user may define
preferred, acceptable, and minimum values for preferred, acceptable, and minimum values for various service parameters.various service parameters.
– Transport layer determines whether it’s Transport layer determines whether it’s possible to provide required service based on possible to provide required service based on available network service(s).available network service(s).
Transport-Layer QoS Parameters Transport-Layer QoS Parameters 11
Connection establishment delay: time to Connection establishment delay: time to establish connection.establish connection.
Connection establishment failure Connection establishment failure probability: probability connection is not probability: probability connection is not established within maximum establishment established within maximum establishment time.time.
Throughput: bytes transferred per second Throughput: bytes transferred per second measured over a time interval.measured over a time interval.
Transport-Layer QoS Parameters Transport-Layer QoS Parameters 22
Transit delay: time between sending a Transit delay: time between sending a message and receiving it on the other side message and receiving it on the other side (measured by the transport entities).(measured by the transport entities).
Residual error ratio: ratio of messages in error Residual error ratio: ratio of messages in error to total messages sent.to total messages sent.
Priority: way for user to indicate that some Priority: way for user to indicate that some connections are more important.connections are more important.
Resilience: probability connection is Resilience: probability connection is terminated due to congestion, etc. terminated due to congestion, etc.
Transport Layer QoSTransport Layer QoS
Only few transport protocols provide QoS Only few transport protocols provide QoS parameters. parameters.
Most just try to minimize residual error rate.Most just try to minimize residual error rate. QoS parameters specified by transport user QoS parameters specified by transport user
when connection is setup.when connection is setup.– Desired and minimum acceptable values can be Desired and minimum acceptable values can be
specified. specified. – Service negotiation.Service negotiation.
Transport Service PrimitivesTransport Service Primitives
Allow transport users (e.g., application Allow transport users (e.g., application programs) to access transport service.programs) to access transport service.
Example: connection-oriented transport service Example: connection-oriented transport service primitives.primitives.PRIMITIVEPRIMITIVE TPDU SentTPDU Sent Meaning MeaningLISTENLISTEN (none) (none) listen for connection listen for connectionCONNECTCONNECT Connection Req. try to establish connection Connection Req. try to establish connection
SENDSEND DATA DATA send data send dataRECEIVERECEIVE (none)(none) waits for data waits for dataDISCONNECTDISCONNECT Disc. Req.Disc. Req. try to release connection try to release connection
TPDUTPDU
Transport protocol data unit.Transport protocol data unit. Messages sent between transport entities.Messages sent between transport entities. TPDUs contained in network-layer packets, TPDUs contained in network-layer packets,
which in turn are contained in DLL frames.which in turn are contained in DLL frames.
Frameheader
Packetheader
TPDUheader TPDU payload
Connection Management State Connection Management State MachineMachine
Established
Idle
Activeestablishmentpending
Activedisconnectpending
Idle
Passiveestablishmentpending
Passivedisconnectpending
Connectexecuted
ConnectionAccept
SERVER CLIENTConnection req. received
Connectexecuted
Disc. req.received
s
Disconnectexecuted
Disconnectexecute
Disc. accept. received
Berkeley Sockets 1Berkeley Sockets 1
Set of transport-level primitives made available by Set of transport-level primitives made available by Berkeley UNIX. Berkeley UNIX.
Server side: Server side: » SOCKET: create new communication end point.SOCKET: create new communication end point.
» BIND: attach local address to socket (once server binds address, BIND: attach local address to socket (once server binds address, clients can connect to it).clients can connect to it).
» LISTEN: listen for connection.LISTEN: listen for connection.
» ACCEPT: accept new connection.ACCEPT: accept new connection.
» SEND, RECEIVE: send and receive data.SEND, RECEIVE: send and receive data.
» CLOSE: release connection.CLOSE: release connection.
Berkeley Sockets 2Berkeley Sockets 2
Client side:Client side:» SOCKET: create socket.SOCKET: create socket.
» CONNECT: try to establish connection.CONNECT: try to establish connection.
» SEND, RECEIVE: send and receive data.SEND, RECEIVE: send and receive data.
» CLOSE: release connection. CLOSE: release connection.
Transport Protocol Issues: Transport Protocol Issues: AddressingAddressing
Address of the transport-level entity.Address of the transport-level entity. TSAP: transport service access point TSAP: transport service access point
(analogous to NSAP).(analogous to NSAP).– Internet TSAP: (IP address, local port).Internet TSAP: (IP address, local port).– Internet NSAP: IP address.Internet NSAP: IP address.– There may be multiple TSAPs on one host.There may be multiple TSAPs on one host.– Typically, only one NSAP.Typically, only one NSAP.
Example 1Example 1
Finding the time of day from a time-of-day Finding the time of day from a time-of-day server.server.– Time-of-day server process on host 2 attaches Time-of-day server process on host 2 attaches
itself to TSAP 122 and waits for requests (e.g., itself to TSAP 122 and waits for requests (e.g., through LISTEN).through LISTEN).
– Application process (TSAP 6) on host 1 wants Application process (TSAP 6) on host 1 wants to find out the time-of-day; issues CONNECT to find out the time-of-day; issues CONNECT specifying TSAP 6 as source and TSAP 122 as specifying TSAP 6 as source and TSAP 122 as destination.destination.
Finding Services 1Finding Services 1
Well-known TSAP.Well-known TSAP.– Time-of-day server has been using TSAP 122 forever so Time-of-day server has been using TSAP 122 forever so
every users know it.every users know it.
Initial connection protocol: special Initial connection protocol: special process process serverserver that proxies for less well-known that proxies for less well-known services.services.– Process server listens to set of ports at the same time.Process server listens to set of ports at the same time.– Users CONNECT to a TSAP, and if there are no servers, Users CONNECT to a TSAP, and if there are no servers,
process server is likely to be listening. It them spawns process server is likely to be listening. It them spawns requested server.requested server.
Finding Services 2Finding Services 2
Name or directory service.Name or directory service.– Name server listens to well-known TSAP.Name server listens to well-known TSAP.– User sends service name and name server User sends service name and name server
responds with service’s TSAP.responds with service’s TSAP.– New services need to register with name server.New services need to register with name server.
Finding the server’s network address.Finding the server’s network address.– Hierarchical addresses solve this problem, i.e., the Hierarchical addresses solve this problem, i.e., the
NSAP is part of the TSAP.NSAP is part of the TSAP.
Connection EstablishmentConnection Establishment
CONNECTION REQUEST and CONNECTION CONNECTION REQUEST and CONNECTION ACCEPTED TPDUs.ACCEPTED TPDUs.
Problem: delayed duplicates.Problem: delayed duplicates.– Duplicates can re-appear and be taken as the real Duplicates can re-appear and be taken as the real
messages.messages.
Solution: messages age and are discarded after some Solution: messages age and are discarded after some time; need to discard ack’s.time; need to discard ack’s.– Maximum hop count.Maximum hop count.
– Timestamp.Timestamp.
Avoiding Duplicates 1Avoiding Duplicates 1
2 identically numbered TPDUs are never 2 identically numbered TPDUs are never outstanding at the same time.outstanding at the same time.
Bounded packet lifetime.Bounded packet lifetime. Each host has its clock.Each host has its clock.
– Clock as a counter that increments itself.Clock as a counter that increments itself.– #bits(counter)>= #bits(sequence number).#bits(counter)>= #bits(sequence number).– Clocks don’t “crash”.Clocks don’t “crash”.
Avoiding Duplicates 2Avoiding Duplicates 2
When connection setup, low-order When connection setup, low-order kk bits of bits of clock used as initial sequence number.clock used as initial sequence number.
Each connection starts numbering its Each connection starts numbering its TPDUs with different sequence number.TPDUs with different sequence number.
Sequence number space need to be such Sequence number space need to be such that by the time sequence numbers wrap that by the time sequence numbers wrap around, old TPDUs with same sequence around, old TPDUs with same sequence numbers have aged.numbers have aged.
Sequence Numbers versus Time Sequence Numbers versus Time 11
Seq.#’s
Time
. Linear relation between timeand initial sequence number.
Sequence Numbers versus Time Sequence Numbers versus Time 22
Seq.#’s
Time
. Host crash: when it comes up, it doesn’t know where it ere in the sequence # space.
T
Forbiddenregion
. Example: T=60 sec and clock ticks once per second.. At t=30s, TPDU on connection5 gets seq.# 80.
. Host crashes and comes up.
. At t=60s, reopens connections 0~4.
. At t=70s, reopens connection 5 and at t=80s, sends TPDU 80.
. Old TPDU 80 still valid, and one would look like a duplicate.
. To prevent this, check if it’s in the “forbidden region” and delay sequence number.
Three-Way HandshakeThree-Way Handshake
Solves the problem of getting 2 sides to Solves the problem of getting 2 sides to agree on initial sequence number.agree on initial sequence number.
CR (seq=x)
ACK(seq=y,ACK=x)
DATA(seq=x, ACK=y)
CR: connectionrequest.
1 2
3-Way Handshake: Duplicates 13-Way Handshake: Duplicates 1
. Old duplicate CR.
. The ACK from host 2 triesto verify if host 1 was trying to open a new connection with seq=x.. Host 1 rejects host 2’s attempt to establish.Host 2 realizes it was a duplicateCR and aborts connection.
CR(seq=x)*
ACK(seq=y, ACK=x)
REJECT(ACK=y)
1 2
3-Way Handshake: Duplicates 23-Way Handshake: Duplicates 2
. Old duplicate CR and ACKto connection accepted.
CR(seq=x)*
ACK(seq=y, ACK=x)
REJECT(ACK=y)
1 2
DATA(seq=x,ACK=z)
Connection ReleaseConnection Release
Asymmetric release: telephone system.Asymmetric release: telephone system.– When one party hangs up, connection breaks.When one party hangs up, connection breaks.– May cause data loss.May cause data loss.
Symmetric release: Symmetric release: – Treats connection as 2 separate unidirectional Treats connection as 2 separate unidirectional
connections.connections.– Requires each to be released separately.Requires each to be released separately.
Symmetric ReleaseSymmetric Release
How to determine when all data has been How to determine when all data has been sent and connection could be released?sent and connection could be released?
2-army problem:2-army problem:
Blue army 1
White army
Blue army 2
. White army largerthan either blue armies.. Blue army together is larger.. If each blue army attacks, it’ll be defeated. They win if attack together.
2-Army Problem 12-Army Problem 1
To synchronize attack, they must use messengers that To synchronize attack, they must use messengers that need to cross valley: unreliable.need to cross valley: unreliable.
Is there a protocol that allows blue army to win? No.Is there a protocol that allows blue army to win? No.– Blue army 1 sends message to blue army 2.Blue army 1 sends message to blue army 2.
– Blue army 2 sends ACK back.Blue army 2 sends ACK back.
– Blue army 2 is not sure whether ACK was received.Blue army 2 is not sure whether ACK was received.
2-Army Problem 22-Army Problem 2
Use 2-way handshake.Use 2-way handshake.– Blue army 1 ACKs back but it’ll never know if Blue army 1 ACKs back but it’ll never know if
the ACK was received.the ACK was received. Applying to connection release:Applying to connection release:
– Neither side is prepared to disconnect until Neither side is prepared to disconnect until convince other side is prepared to disconnect.convince other side is prepared to disconnect.
– In practice, hosts are willing to take risks. In practice, hosts are willing to take risks.
Connection Release ProtocolConnection Release Protocol
DR
DR
ACK
DR: disconnectionrequest.
Send DR+start timer
Send DR+start timerRelease
connection
Send ACK Release
connection
Connection Release Scenarios 1 Connection Release Scenarios 1
DR
DR
ACK
DR: disconnectionrequest.
Send DR+start timer
Send DR+start timerRelease
connection
Send ACK Timeout:
Release connection
Connection Release Scenarios 2 Connection Release Scenarios 2
DR
DR
DR: disconnectionrequest.
Send DR+start timer
Send DR+start timerTimeout:
send DR+start timer
Release connection
DR
Send DR+start timerDR
ACK
The Internet Transport Protocols: The Internet Transport Protocols: TCP and UDPTCP and UDP
UDP: user datagram protocol (RFC 768).UDP: user datagram protocol (RFC 768).– Connection-less protocol.Connection-less protocol.
TCP: transmission control protocol (RFCs TCP: transmission control protocol (RFCs 793, 1122, 1323).793, 1122, 1323).– Connection-oriented protocol.Connection-oriented protocol.
UDPUDP
Provides connection-less, unreliable service.Provides connection-less, unreliable service.– No delivery guarantees.No delivery guarantees.– No ordering guarantees.No ordering guarantees.– No duplicate detection.No duplicate detection.
Low overhead.Low overhead.– No connection establishment/teardown.No connection establishment/teardown.
Suitable for short-lived connections.Suitable for short-lived connections.– Example: client-server applications. Example: client-server applications.
UDP Segment FormatUDP Segment Format
0 15 31
Source port Destination port
Length Checksum
Data
Source and destination ports: identify the end points.Length: 8-byte header+ data.Checksum: optional; if not used, set to zero.
UDP ChecksumUDP Checksum
Computed over a Computed over a pseudo-headerpseudo-header+ UDP + UDP header+data+padding (to even number of header+data+padding (to even number of bytes if needed).bytes if needed).
Pseudo-header:Pseudo-header:
0 31
Source IP address
Destination IP address
00000000 Protocol Segment length
TCPTCP
Reliable end-to-end communication.Reliable end-to-end communication. TCP transport entity:TCP transport entity:
– Runs on machine that supports TCP.Runs on machine that supports TCP.– Interfaces to the IP layer.Interfaces to the IP layer.– Manages TCP streams.Manages TCP streams.
» Accepts user data, breaks it down and sends it as Accepts user data, breaks it down and sends it as separate IP datagrams.separate IP datagrams.
» At receiver, reconstructs original byte stream from At receiver, reconstructs original byte stream from IP datagrams.IP datagrams.
TCP ReliabilityTCP Reliability
Reliable delivery.Reliable delivery.– ACKs.ACKs.– Timeouts and retransmissions.Timeouts and retransmissions.
Ordered delivery.Ordered delivery.
TCP Service Model 1TCP Service Model 1
Obtained by creating TCP end points.Obtained by creating TCP end points.– Example: UNIX sockets.Example: UNIX sockets.– TSAP address: IP address + 16-bit port TSAP address: IP address + 16-bit port
number.number.– Multiple connections can share same port pair.Multiple connections can share same port pair.– Port numbers below 1024: well-known ports Port numbers below 1024: well-known ports
reserved for standard services.reserved for standard services.» List of well-known ports in RFC 1700.List of well-known ports in RFC 1700.
TCP Service Model 2TCP Service Model 2
TCP connections are full-duplex and point-TCP connections are full-duplex and point-to-point.to-point.
Byte stream (not message stream).Byte stream (not message stream).– Message boundaries are not preserved e2e. Message boundaries are not preserved e2e.
A B C D
4 512-byte segments sent asseparate IP datagrams
A B C D
2048 bytes of data deliveredto application in single READ
TCP Byte StreamTCP Byte Stream
When application passes data to TCP, it When application passes data to TCP, it may send it immediately or buffer it.may send it immediately or buffer it.
Sometimes application wants to send data Sometimes application wants to send data immediately.immediately.– Example: interactive applications.Example: interactive applications.– Use PUSH flag to force transmission.Use PUSH flag to force transmission.
URGENT flag.URGENT flag.– Also forces TCP to transmit at once.Also forces TCP to transmit at once.
TCP Protocol Overview 1TCP Protocol Overview 1
TCP’s TPDU: segment.TCP’s TPDU: segment.– 20-byte header + options.20-byte header + options.– Data.Data.– TCP entity decides the size of segment.TCP entity decides the size of segment.
» 2 limits: 64KByte IP payload and MTU.2 limits: 64KByte IP payload and MTU.
» Segments that are too large are fragmented.Segments that are too large are fragmented. More overhead by addition of IP header. More overhead by addition of IP header.
TCP Protocol Overview 2TCP Protocol Overview 2
Sequence numbers.Sequence numbers.– Reliability, ordering, and flow control.Reliability, ordering, and flow control.– Assigned to every byte.Assigned to every byte.– 32-bit sequence numbers.32-bit sequence numbers.
TCP Segment HeaderTCP Segment Header
Source port Destination port
Sequence number
Acknowledgment numberHeaderlength
UA
P R S F Window size
Checksum Urgent pointerOptions (0 or more 32-bit words)
Data
TCP Header Fields 1TCP Header Fields 1
Source and destination ports identify Source and destination ports identify connection end points.connection end points.
Sequence number.Sequence number. Acknowledgment number specifies next byte Acknowledgment number specifies next byte
expected.expected. TCP header length: how many 32-bit words TCP header length: how many 32-bit words
are contained in header.are contained in header. 6-bit unused field.6-bit unused field.
TCP Header Fields 2TCP Header Fields 2
6 1-bit flags:6 1-bit flags:– URG: indicate urgent data present; URG: indicate urgent data present; urgent urgent
pointerpointer gives byte offset from current sequence gives byte offset from current sequence number where urgent data is.number where urgent data is.
– ACK: indicates whether segment contains ACK: indicates whether segment contains acknowledgment; if 0, acknowledgment; if 0, acknowledgement acknowledgement numbernumber field ignored. field ignored.
– PUSH: indicates PUSHed data so receiver PUSH: indicates PUSHed data so receiver delivers it to application immediately.delivers it to application immediately.
TCP Header Fields 3TCP Header Fields 3
Flags (cont’d):Flags (cont’d):– RST: used to reset connection, reject invalid RST: used to reset connection, reject invalid
segment, or refuse to open connection.segment, or refuse to open connection.– SYN: used to establish connection; connection SYN: used to establish connection; connection
request, SYN=1, ACK=0.request, SYN=1, ACK=0.– FIN: used to release connection.FIN: used to release connection.
Window size: how many bytes can be sent Window size: how many bytes can be sent starting at starting at acknowledgment numberacknowledgment number..
TCP Header Fields 4TCP Header Fields 4
Checksum: checksums the Checksum: checksums the header+data+pseudo-header.header+data+pseudo-header.
Options: provide way to add extra Options: provide way to add extra information.information.– Examples: Examples:
» Maximum payload host is willing to accept; can be Maximum payload host is willing to accept; can be advertised during connection setup.advertised during connection setup.
» Window scale factor that allows sender and Window scale factor that allows sender and receiver to negotiate larger window sizes.receiver to negotiate larger window sizes.
TCP Connection SetupTCP Connection Setup
3-way handshake.3-way handshake.
Host 1 Host 2SYN (SEQ=x)
SYN(SEQ=y,ACK=x+1)
(SEQ=x+1, ACK=y+1)
TCP Connection Release 1 TCP Connection Release 1
Abrupt release:Abrupt release:– Send RESET.Send RESET.– May cause data loss.May cause data loss.
TCP Connection Release 2 TCP Connection Release 2
Graceful release:Graceful release:– Each side of the connection released Each side of the connection released
independently.independently.» Either side send TCP segment with FIN=1.Either side send TCP segment with FIN=1.» When FIN acknowledged, that direction is shut down for data.When FIN acknowledged, that direction is shut down for data.» Connection released when both sides shut down. Connection released when both sides shut down.
– 4 segments: 1 FIN and 1 ACK for each direction; 4 segments: 1 FIN and 1 ACK for each direction; 1st. ACK+2nd. FIN combined.1st. ACK+2nd. FIN combined.
TCP Connection Release 3 TCP Connection Release 3
Timers to avoid 2-army problem.Timers to avoid 2-army problem.– If response to FIN not received within 2*MSL, If response to FIN not received within 2*MSL,
FIN sender releases connection.FIN sender releases connection. After connection released, TCP waits for After connection released, TCP waits for
2*MSL (e.g., 120 sec) to ensure all old 2*MSL (e.g., 120 sec) to ensure all old segments have aged.segments have aged.
TCP Transmission 1 TCP Transmission 1
Sender process initiates connection.Sender process initiates connection. Once connection established, TCP can start Once connection established, TCP can start
sending data.sending data. Sender writes bytes to TCP stream.Sender writes bytes to TCP stream. TCP sender breaks byte stream into TCP sender breaks byte stream into
segments.segments.– Each byte assigned sequence number.Each byte assigned sequence number.– Segment sent and timer started. Segment sent and timer started.
TCP Transmission 2TCP Transmission 2
If timer expires, retransmit segment.If timer expires, retransmit segment.– After retransmitting segment for maximum After retransmitting segment for maximum
number of times, assumes connection is dead and number of times, assumes connection is dead and closes it.closes it.
If user aborts connection, sending TCP flushes If user aborts connection, sending TCP flushes its buffers and sends RESET segment.its buffers and sends RESET segment.
Receiving TCP decides when to pass received Receiving TCP decides when to pass received data to upper layer.data to upper layer.
TCP Flow ControlTCP Flow Control
Sliding window.Sliding window.– Receiver’s Receiver’s advertised windowadvertised window..
» Size of advertised window related to receiver’s Size of advertised window related to receiver’s buffer space.buffer space.
» Sender can send data up to receiver’s advertised Sender can send data up to receiver’s advertised window.window.
TCP Flow Control: ExampleTCP Flow Control: Example
2K;SEQ=0
ACK=2048; WIN=2048
2K; SEQ=2048
ACK=4096; WIN=0
ACK=4096; WIN=2048
1K; SEQ=4096
App. writes 2K of data
4K
2K
0
App. reads 2K of data
2K
1K
App. does 3K write
Senderblocked
Sendermay send upto 2K
TCP Flow Control: Observations TCP Flow Control: Observations
TCP sender not required to transmit data as TCP sender not required to transmit data as soon as it comes in form application.soon as it comes in form application.– Example: when first 2KB of data comes in, Example: when first 2KB of data comes in,
could wait for more data since window is 4KB.could wait for more data since window is 4KB. Receiver not required to send ACKs as Receiver not required to send ACKs as
soon as possible.soon as possible.– Wait for data so ACK is piggybacked.Wait for data so ACK is piggybacked.
Delayed ACKsDelayed ACKs Tries to optimize ACK transmission.Tries to optimize ACK transmission. Delay ACKs and window update (500msec) Delay ACKs and window update (500msec)
hoping to piggyback on data segment.hoping to piggyback on data segment. Example: telnet to interactive editor:Example: telnet to interactive editor:
– Send 1 character at a time: 20-byte TCP header+ 1-byte Send 1 character at a time: 20-byte TCP header+ 1-byte data+20-byte IP header.data+20-byte IP header.
– Receiver ACKs immediately: 40-byte ACK.Receiver ACKs immediately: 40-byte ACK.– When editor reads character, window update: 40-byte When editor reads character, window update: 40-byte
datagram.datagram.– Then echoes character back: 41-byte datagram.Then echoes character back: 41-byte datagram.
Nagle’s AlgorithmNagle’s Algorithm
Tries to optimize sending of small data Tries to optimize sending of small data chunks.chunks.
Example: telnet to interactive editor). Example: telnet to interactive editor). – Send first byte and buffer the rest until Send first byte and buffer the rest until
outstanding byte is ACKed; then send all buffered outstanding byte is ACKed; then send all buffered data in one segment; buffer until next ACK. data in one segment; buffer until next ACK.
Disabled in some cases (e.g., window Disabled in some cases (e.g., window application: mouse movements).application: mouse movements).
Silly Window SyndromeSilly Window Syndrome
Caused by receiver sending window updates of very Caused by receiver sending window updates of very small values.small values.– Example: Example:
» Receiver application reads 1 byte at a time and receiver TCP Receiver application reads 1 byte at a time and receiver TCP sends 1-byte window update.sends 1-byte window update.
» Sender TCP has large blocks to send but can only send 1 byte at a Sender TCP has large blocks to send but can only send 1 byte at a time.time.
Solution: [Clark] prevent receiver from generating Solution: [Clark] prevent receiver from generating small window advertisements; also, sender can wait.small window advertisements; also, sender can wait.
Congestion ControlCongestion Control
Why do it at the transport layer?Why do it at the transport layer?– Real fix to congestion is to slow down sender.Real fix to congestion is to slow down sender.
Use law of “conservation of packets”.Use law of “conservation of packets”.– Keep number of packets in the network Keep number of packets in the network
constant.constant.– Don’t inject new packet until old one leaves.Don’t inject new packet until old one leaves.
Congestion indicator: packet loss.Congestion indicator: packet loss.
TCP Congestion Control 1TCP Congestion Control 1
Like, flow control, also window based.Like, flow control, also window based.– Sender keeps Sender keeps congestion window (cwin)congestion window (cwin)..– Each sender keeps 2 windows: receiver’s Each sender keeps 2 windows: receiver’s
advertised window and congestion window.advertised window and congestion window.– Number of bytes that may be sent is Number of bytes that may be sent is
min(advertised window, cwin).min(advertised window, cwin).
TCP Congestion Control 2TCP Congestion Control 2
Slow start [Jacobson 1988]:Slow start [Jacobson 1988]:– Connection’s congestion window starts at 1 Connection’s congestion window starts at 1
segment.segment.– If segment ACKed before time out, If segment ACKed before time out,
cwin=cwin+1.cwin=cwin+1.– As ACKs come in, current cwin is increased by As ACKs come in, current cwin is increased by
1.1.– Exponential increase. Exponential increase.
TCP Congestion Control 3TCP Congestion Control 3
Congestion Avoidance:Congestion Avoidance:– Third parameter: Third parameter: thresholdthreshold..– Initially set to 64KB.Initially set to 64KB.– If timeout, threshold=cwin/2 and cwin=1.If timeout, threshold=cwin/2 and cwin=1.– Re-enters slow-start until cwin=threshold.Re-enters slow-start until cwin=threshold.– Then, cwin grows linearly until it reaches Then, cwin grows linearly until it reaches
receiver’s advertised window.receiver’s advertised window.
TCP Retransmission TimerTCP Retransmission Timer
When segment sent, retransmission timer When segment sent, retransmission timer starts.starts.– If segment ACKed, timer stops.If segment ACKed, timer stops.– If time out, segment retransmitted and timer If time out, segment retransmitted and timer
starts again.starts again.
How to set timer?How to set timer?
Based on round-trip time: time between a Based on round-trip time: time between a segment is sent and ACK comes back.segment is sent and ACK comes back.
If timer is too short, unnecessary If timer is too short, unnecessary retransmissions.retransmissions.
If timer is too long, long retransmission If timer is too long, long retransmission delay.delay.
Jacobson’s Algorithm 1Jacobson’s Algorithm 1
Determining the round-trip time:Determining the round-trip time:– TCP keeps TCP keeps RTTRTT variable. variable. – When segment sent, TCP measures how long it When segment sent, TCP measures how long it
takes to get ACK back (takes to get ACK back (MM).).– RTT = alpha*RTT + (1-alpha)M.RTT = alpha*RTT + (1-alpha)M.– alpha: smoothing factor; determines weight alpha: smoothing factor; determines weight
given to previous estimate.given to previous estimate.– Typically, alpha=7/8.Typically, alpha=7/8.
Jacobson’s Algorithm 2Jacobson’s Algorithm 2
Determining timeout value:Determining timeout value:– Measure RTT variation, or |RTT-M|.Measure RTT variation, or |RTT-M|.– Keeps smoothed value of cumulative variation Keeps smoothed value of cumulative variation
D=alpha*D+(1-alpha)|RTT-M|.D=alpha*D+(1-alpha)|RTT-M|.– Alpha may or may not be the same as value Alpha may or may not be the same as value
used to smooth RTT.used to smooth RTT.– Timeout = RTT+4*D. Timeout = RTT+4*D.
Karn’s AlgorithmKarn’s Algorithm
How to compute ACKs for retransmitted How to compute ACKs for retransmitted segments? segments? – Count it for first or second transmission?Count it for first or second transmission?– Karn proposed not to update RTT on any Karn proposed not to update RTT on any
retransmitted segment.retransmitted segment.– Instead RTT is doubled on each failure until Instead RTT is doubled on each failure until
segments get through.segments get through.
Persistence TimerPersistence Timer
Prevents deadlock if an window update Prevents deadlock if an window update packet is lost and advertised window = 0.packet is lost and advertised window = 0.
When persistence timer goes off, sender When persistence timer goes off, sender probes receiver; receiver replies with its probes receiver; receiver replies with its current advertised window.current advertised window.
If 0, persistence timer is set again. If 0, persistence timer is set again.
Keepalive TimerKeepalive Timer
Goes off when a connection is idle for a Goes off when a connection is idle for a long time.long time.
Causes one side to check whether the other Causes one side to check whether the other side is still alive.side is still alive.
If no answer, connection terminated. If no answer, connection terminated.
TIME_WAITTIME_WAIT
2*MSL.2*MSL. Makes sure all segments die after Makes sure all segments die after
connection is closed.connection is closed.
Wireless TCP 1Wireless TCP 1
According to layered system design According to layered system design principles, transport protocol should be principles, transport protocol should be independent of underlying technology.independent of underlying technology.
However, wireless networks invalidate this However, wireless networks invalidate this principle.principle.– Ignoring properties of wireless medium can Ignoring properties of wireless medium can
lead to poor TCP performance.lead to poor TCP performance.– Problem: TCP’s congestion control.Problem: TCP’s congestion control.
Wireless TCP 2Wireless TCP 2
Problem: packet loss as congestion Problem: packet loss as congestion indicator.indicator.– When retransmission timer times out, sender When retransmission timer times out, sender
slows down.slows down. Wireless links are lossy!Wireless links are lossy!
– Dealing with losses in this case should be re-Dealing with losses in this case should be re-sending lost segments asap.sending lost segments asap.
Indirect TCP (I-TCP)Indirect TCP (I-TCP)
[Bakne and Badrinath, 1995].[Bakne and Badrinath, 1995]. Split TCP connection in 2: one from sender to base Split TCP connection in 2: one from sender to base
station and the other from base station to receiver.station and the other from base station to receiver.– Base station serves as “repeater”: copies segments Base station serves as “repeater”: copies segments
between connections in both directions.between connections in both directions.– Connections are homogeneous; timeouts on 1st. Connections are homogeneous; timeouts on 1st.
connection, slow down sender.connection, slow down sender.– Problem: violates TCP’s e2e’ness.Problem: violates TCP’s e2e’ness.
Example: ACKs to sender mean base station received segments, not Example: ACKs to sender mean base station received segments, not necessarily receiver. necessarily receiver.
Snoop TCPSnoop TCP
[Balakrishnan et al., 1995].[Balakrishnan et al., 1995]. Does not break connection. Does not break connection. Modifications to base station’s network layer code.Modifications to base station’s network layer code.
– Snooping agent on base station observes and caches TCP Snooping agent on base station observes and caches TCP segments sent to mobile host and ACKs coming back.segments sent to mobile host and ACKs coming back.
– If it doesn’t see an ACK for a segment or sees duplicate If it doesn’t see an ACK for a segment or sees duplicate ACKs, it times out and retransmits.ACKs, it times out and retransmits.
– But source may time out anyway.But source may time out anyway.
End-To-End ArgumentEnd-To-End Argument
Design principle to help guide placement of Design principle to help guide placement of functionality in distributed systems.functionality in distributed systems.
Rationale for moving functions upward Rationale for moving functions upward closer to application.closer to application.
Where to place distributed Where to place distributed systems functions?systems functions?
Layered system design:Layered system design:– Different levels of abstraction for simplicity.Different levels of abstraction for simplicity.– Lower layer provides service to upper layer.Lower layer provides service to upper layer.– Very well defined interfaces.Very well defined interfaces.
Some functions can be implemented at Some functions can be implemented at different layers or even at multiple layers.different layers or even at multiple layers.
E2E Argument StatementE2E Argument Statement
““The function in question can completely and The function in question can completely and correctly be implemented only with the correctly be implemented only with the knowledge and help of the application at knowledge and help of the application at the endpoints. Therefore providing that the endpoints. Therefore providing that function in the communication system itself function in the communication system itself is not possible. Sometimes an incomplete is not possible. Sometimes an incomplete version of the function provided by the version of the function provided by the communication system may be useful as communication system may be useful as performance enhancementperformance enhancement.”.”
Functions Closer to ApplicationFunctions Closer to Application
E2E argument paper argues that functions should be E2E argument paper argues that functions should be moved closer to the application that uses them.moved closer to the application that uses them.
Rationale:Rationale:– Some functions can only be completely and correctly Some functions can only be completely and correctly
implemented with app’s knowledge.implemented with app’s knowledge.» Example: file transfer.Example: file transfer.
» If error occurs in the network, network reliability can fix it.If error occurs in the network, network reliability can fix it.
» Otherwise, only application can.Otherwise, only application can.
Another perspective: CostAnother perspective: Cost
Why pay for something you don’t need.Why pay for something you don’t need.» Example 1: the Internet.Example 1: the Internet.
» Example 2: trend in kernel design - take away from Example 2: trend in kernel design - take away from kernel as much functionality as possible.kernel as much functionality as possible.
Applications that don’t need certain Applications that don’t need certain functions should not have to pay for them. functions should not have to pay for them.
E2E Counter ArgumentE2E Counter Argument
Performance!Performance!– Example: File transferExample: File transfer
» Reliability checks at lower layers detect problems Reliability checks at lower layers detect problems earlier.earlier.
» Abort transfer and re-try without having to wait till Abort transfer and re-try without having to wait till whole file is transmitted.whole file is transmitted.
““Spread out” functionality across layers.Spread out” functionality across layers.
Domain Name System (DNS)Domain Name System (DNS)
Basic function: translation of names (ASCII Basic function: translation of names (ASCII strings) to network (IP) addresses and vice-strings) to network (IP) addresses and vice-versa.versa.
Example: Example: – zephyr.isi.edu <-> 128.9.160.160zephyr.isi.edu <-> 128.9.160.160
HistoryHistory
Original approach (ARPANET, 1970’s):Original approach (ARPANET, 1970’s):– File File hosts.txt hosts.txt listed all hosts and their IP addresses.listed all hosts and their IP addresses.– Every night every host fetches file from central Every night every host fetches file from central
repository.repository.– OK for a few hundred hosts.OK for a few hundred hosts.– Scalability?Scalability?
» File size.File size.
» Centrally managed.Centrally managed.
DNSDNS
Hierarchical name space.Hierarchical name space. Distributed database.Distributed database. RFCs 1034 and 1035.RFCs 1034 and 1035.
How is it used?How is it used?
Client-server model.Client-server model.– Client DNS (running on client hosts), or Client DNS (running on client hosts), or
resolver.resolver.– Application calls resolver with name.Application calls resolver with name.– Resolver contacts local DNS server (using Resolver contacts local DNS server (using
UDP) passing the name.UDP) passing the name.– Server returns corresponding IP address.Server returns corresponding IP address.
DNS Name SpaceDNS Name Space
Tree-based hierarchy.Tree-based hierarchy.
int com edu gov mil org net us ca …
usc
cs ee
ibm
eng sales
Name Space StructureName Space Structure
Top-level domains:Top-level domains:– Generic.Generic.– Countries.Countries.
Leaf domains: no sub-domains.Leaf domains: no sub-domains. In practice all US organizations are under a In practice all US organizations are under a
generic domain, while everything outside generic domain, while everything outside the US is under the corresponding country the US is under the corresponding country domain.domain.
DNS NamesDNS Names
Domain names:Domain names:– Concatenation of all domain names starting from Concatenation of all domain names starting from
its own all the way to the root separated by “.”.its own all the way to the root separated by “.”.– Refers to a tree node and all names under it.Refers to a tree node and all names under it.– Case insensitive.Case insensitive.– Components up to 63 characters.Components up to 63 characters.– Full name less than 255 characters.Full name less than 255 characters.
Name Space ManagementName Space Management
Domains are autonomous.Domains are autonomous.– Organizational boundaries.Organizational boundaries.– Each domain manages its own name space Each domain manages its own name space
independently of other domains.independently of other domains. Delegation:Delegation:
– When creating new domain: register with parent When creating new domain: register with parent domain.domain.
» For name uniqueness.For name uniqueness.
» For name resolution.For name resolution.
Resource RecordsResource Records
Entry in the DNS database.Entry in the DNS database. Several types of entries or RRs.Several types of entries or RRs. Example: RR “A” contains IP address.Example: RR “A” contains IP address. Name <-> several resource records.Name <-> several resource records. RR format: five-tuple.RR format: five-tuple.
– Name.Name.– TTL (in seconds).TTL (in seconds).– Class (usually “IN” for Internet info).Class (usually “IN” for Internet info).– Type: type of RR.Type: type of RR.– Value.Value.
RR Types 1RR Types 1
SOA: start of authority.SOA: start of authority.– Marks beginning of zone’s database.Marks beginning of zone’s database.– Provides general info about the zone: e-mail Provides general info about the zone: e-mail
address of admin, default TTL, etc.address of admin, default TTL, etc. A: address.A: address.
– Contains 32-bit IP address.Contains 32-bit IP address.– Single name <-> several A RRs.Single name <-> several A RRs.
MX: mail exchange.MX: mail exchange.– Name of mail server for this domain.Name of mail server for this domain.
RR Types 2RR Types 2
NS: name server.NS: name server.– Name of name server for this domain.Name of name server for this domain.
CNAME: canonical name.CNAME: canonical name.– Alias.Alias.
HINFO: host description.HINFO: host description.– Provides information about host, e.g., CPU type, OS, Provides information about host, e.g., CPU type, OS,
etc.etc. TXT: arbitrary string of characters.TXT: arbitrary string of characters.
– Generic description of the domain, where it is located, Generic description of the domain, where it is located, etc.etc.
Name ServersName Servers
Entire database in a single name server.Entire database in a single name server.– Practical?Practical?– Why?Why?
DNS database is partitioned into DNS database is partitioned into zoneszones.. Each zone contains part of the DNS tree.Each zone contains part of the DNS tree. Zone <-> name server.Zone <-> name server.
– Each zone may be served by more than 1 server.Each zone may be served by more than 1 server.– A server may serve multiple zones.A server may serve multiple zones.
Primary and secondary name servers.Primary and secondary name servers.
Name Resolution 1Name Resolution 1
Application wants to resolve name.Application wants to resolve name. Resolver sends query to local name server.Resolver sends query to local name server.
– Resolver configured with list of local name servers.Resolver configured with list of local name servers.
– Select servers in round-robin fashion.Select servers in round-robin fashion.
If name is local, local name server returns matching If name is local, local name server returns matching authoritativeauthoritative RRs. RRs.– AuthoritativeAuthoritative RR comes from authority managing the RR RR comes from authority managing the RR
and is always correct.and is always correct.
– CachedCached RRs may be out of date. RRs may be out of date.
Name Resolution 2Name Resolution 2
If information not available locally (not If information not available locally (not even cached), local NS will have to ask even cached), local NS will have to ask someone else.someone else.– It asks the server of the top-level domain of the It asks the server of the top-level domain of the
name requested.name requested.
Recursive ResolutionRecursive Resolution
Recursive query:Recursive query:– Each server that doesn’t have info forwards it to Each server that doesn’t have info forwards it to
someone else.someone else.– Response finds its way back.Response finds its way back.
Alternative:Alternative:– Name server not able to resolve query, sends back Name server not able to resolve query, sends back
the name of the next server to try.the name of the next server to try.– Some servers use this method.Some servers use this method.– More control for clients.More control for clients.
ExampleExample
Suppose resolver on flits.cs.vu.nl wants to resolve Suppose resolver on flits.cs.vu.nl wants to resolve linda.cs.yale.edu.linda.cs.yale.edu.– Local NS, cs.vu.nl, gets queried but cannot resolve it.Local NS, cs.vu.nl, gets queried but cannot resolve it.– It then contacts .edu server.It then contacts .edu server.– .edu server forwards query to yale.edu server..edu server forwards query to yale.edu server.– yale.edu contacts cs.yale.edu, which has the authoritative yale.edu contacts cs.yale.edu, which has the authoritative
RR.RR.– Response finds its way back to originator.Response finds its way back to originator.– cs.vu.nl caches this info.cs.vu.nl caches this info.
» Not authoritative (since may be out-of-date).Not authoritative (since may be out-of-date).» RR TTL determines how long RR should be cached.RR TTL determines how long RR should be cached.