196
Reference Notes on Reference Notes on TCP/IP TCP/IP

Reference Notes on TCP/IP. Internetworking Interconnection of 2 or more networks forming an internetwork, or internet. Interconnection of 2 or more networks

Embed Size (px)

Citation preview

Reference Notes on TCP/IPReference Notes on TCP/IP

InternetworkingInternetworking

Interconnection of 2 or more networks Interconnection of 2 or more networks forming an internetwork, or internet.forming an internetwork, or internet.– LANs, MANs, and WANs.LANs, MANs, and WANs.

Different networks man different protocols.Different networks man different protocols.– TCP/IP, IBM’s SNA, DEC’s DECnet, ATM, TCP/IP, IBM’s SNA, DEC’s DECnet, ATM,

Novell and AppleTalk (for LANs).Novell and AppleTalk (for LANs).– Also, satellite and cellular networks.Also, satellite and cellular networks.

Example InternetExample Internet

B R X.25 WAN R

R

SNA WAN

802.5LAN

R802.3LAN

802.4LAN

802.3LAN

LAN-LANLAN-WAN

LAN-WAN-LAN

Gateway: device connecting 2 ormore different networks.

GatewaysGateways Repeaters: operate at physical layer (bits); Repeaters: operate at physical layer (bits);

amplify/regenerate signal.amplify/regenerate signal. Bridges: store-and-forward frames; data link layer Bridges: store-and-forward frames; data link layer

devices.devices. Routers: operate at network layer.Routers: operate at network layer. Transport gateways: connect networks at the Transport gateways: connect networks at the

transport layer.transport layer. Application gateways: connect 2 parts of an Application gateways: connect 2 parts of an

application at application layer.application at application layer.

How do networks differ?How do networks differ?

Service offered: connection-oriented versus connection-less.Service offered: connection-oriented versus connection-less. Protocols: IP, IPX, AppleTalk, DECnet.Protocols: IP, IPX, AppleTalk, DECnet. Addressing: flat (802) versus hierarchical (IP).Addressing: flat (802) versus hierarchical (IP). Maximum packet size.Maximum packet size. Quality of service.Quality of service. Error control: reliable, ordered, unordered delivery.Error control: reliable, ordered, unordered delivery. Flow control: sliding window versus rate-based.Flow control: sliding window versus rate-based. Congestion control: leaky bucket, choke packets.Congestion control: leaky bucket, choke packets. Security: privacy rules, encryption.Security: privacy rules, encryption. Parameters: different timeouts.Parameters: different timeouts.

Types of InternetworksTypes of Internetworks

Connection-oriented concatenation of VC Connection-oriented concatenation of VC subnets.subnets.– VC between source and router closest to destination VC between source and router closest to destination

network. network.

– Router builds V to gateway to other subnet.Router builds V to gateway to other subnet.

– Gateway keeps state about that VC.Gateway keeps state about that VC.

– Builds VC to router in the next subnet, etc.Builds VC to router in the next subnet, etc.

Every packet traverses same path.Every packet traverses same path.– Ordered delivery.Ordered delivery.

– Routers convert between packet formats.Routers convert between packet formats.

Connection-oriented Connection-oriented concatenationconcatenation

VC between source and router closest to VC between source and router closest to destination network. destination network.

Router builds VC to gateway to other Router builds VC to gateway to other subnet. Gateway keeps state about VC.subnet. Gateway keeps state about VC.

Gateway builds VC to router in the next Gateway builds VC to router in the next subnet, etc.subnet, etc.

Every packet traverses same path.Every packet traverses same path.– Ordered delivery.Ordered delivery.– Routers convert between packet formats.Routers convert between packet formats.

Connectionless InternetworkingConnectionless Internetworking

Datagram model.Datagram model.– Different packets may take different routes.Different packets may take different routes.– Separate routing decision for each packet.Separate routing decision for each packet.– No ordered delivery guarantees.No ordered delivery guarantees.

Datagram versus VC InternetsDatagram versus VC Internets VC:VC:

– Plus’s: resources reserved in advance, ordered Plus’s: resources reserved in advance, ordered delivery, short headers.delivery, short headers.

– Minus’s: vulnerability to failures, less adaptive, Minus’s: vulnerability to failures, less adaptive, hard if involving datagram subnet.hard if involving datagram subnet.

Datagram:Datagram:– Plus’s: more robust and adaptive, can be used over Plus’s: more robust and adaptive, can be used over

datagram subnets (many LANs, mobile networks).datagram subnets (many LANs, mobile networks).– Minus’s: Longer headers, unordered delivery.Minus’s: Longer headers, unordered delivery.

TunnelingTunneling

Interconnecting through a “foreign” subnet.Interconnecting through a “foreign” subnet.

G G

WAN

Ethernet 1Ethernet 2

Tunnel

IP

Ethernet frame

IP

Ethernet frameIP

IP packet insidepayload field ofWAN packet.

Internetwork Routing 1Internetwork Routing 1 2-level hierarchy:2-level hierarchy:

– Routing within each network: interior gateway protocol.Routing within each network: interior gateway protocol.

– Routing between networks: exterior gateway protocol.Routing between networks: exterior gateway protocol.

Within each network, different routing algorithms Within each network, different routing algorithms can be used.can be used.

Each network is autonomously managed and Each network is autonomously managed and independent of others: autonomous system (AS).independent of others: autonomous system (AS).

Internetwork Routing 2Internetwork Routing 2

Typically, packet starts in its LAN. Typically, packet starts in its LAN. Gateway receives it (broadcast on LAN to Gateway receives it (broadcast on LAN to “unknown” destination).“unknown” destination).

Gateway sends packet to gateway on the Gateway sends packet to gateway on the destination network using its routing table. destination network using its routing table. If it can use the packet’s native protocol, If it can use the packet’s native protocol, sends packet directly. Otherwise, tunnels it.sends packet directly. Otherwise, tunnels it.

Fragmentation 1Fragmentation 1

Network-specific maximum packet size.Network-specific maximum packet size.– Width of TDM slot.Width of TDM slot.– OS buffer limitations.OS buffer limitations.– Protocol (number of bits in packet length field).Protocol (number of bits in packet length field).

Maximum payloads range from 48 bytes Maximum payloads range from 48 bytes (ATM cells) to 64Kbytes (IP packets).(ATM cells) to 64Kbytes (IP packets).

Fragmentation 2Fragmentation 2

What happens when large packet wants to travel What happens when large packet wants to travel through network with smaller maximum packet size? through network with smaller maximum packet size? FragmentationFragmentation..

Gateways break packets into Gateways break packets into fragmentsfragments; each sent as ; each sent as separate packet.separate packet.

Gateway on the other side have to reassemble Gateway on the other side have to reassemble fragments into original packet.fragments into original packet.

2 kinds of fragmentation: transparent and non-2 kinds of fragmentation: transparent and non-transparent.transparent.

Transparent Fragmentation Transparent Fragmentation

Small-packet network transparent to other subsequent Small-packet network transparent to other subsequent networks.networks.

Fragments of a packet addressed to the same exit Fragments of a packet addressed to the same exit gateway, where packet is reassembled.gateway, where packet is reassembled.– OK for concatenated VC internetworking.OK for concatenated VC internetworking.

Subsequent networks are not aware fragmentation Subsequent networks are not aware fragmentation occurred.occurred.

ATM networks (through special hardware) provide ATM networks (through special hardware) provide transparent fragmentation: segmentation.transparent fragmentation: segmentation.

Problems with Transparent Problems with Transparent Fragmentation Fragmentation

Exit gateway must know when it received all Exit gateway must know when it received all the pieces.the pieces.– Fragment counter or “end of packet” bit.Fragment counter or “end of packet” bit.

Some performance penalty but requiring all Some performance penalty but requiring all fragments to go through same gateway.fragments to go through same gateway.

May have to repeatedly fragment and May have to repeatedly fragment and reassemble through series of small-packet reassemble through series of small-packet networks.networks.

Non-Transparent FragmentationNon-Transparent Fragmentation

Only reassemble at destination host.Only reassemble at destination host.– Each fragment becomes a separate packet.Each fragment becomes a separate packet.– Thus routed independently.Thus routed independently.

Problems:Problems:– Hosts must reassemble.Hosts must reassemble.– Every fragment must carry header until it Every fragment must carry header until it

reaches destination host.reaches destination host.

Keeping Track of Fragments 1Keeping Track of Fragments 1

Fragments must be numbered so that original Fragments must be numbered so that original data stream can be reconstructed.data stream can be reconstructed.

Tree-structured numbering scheme:Tree-structured numbering scheme:– Packet 0 generates fragments 0.0, 0.1, 0.2, …Packet 0 generates fragments 0.0, 0.1, 0.2, …– If these fragments need to be fragmented later on, then If these fragments need to be fragmented later on, then

0.0.0, 0.0.1, …, 0.1.0, 0.1.1, …0.0.0, 0.0.1, …, 0.1.0, 0.1.1, …– But, too much overhead in terms of number of fields But, too much overhead in terms of number of fields

needed.needed.– Also, if fragments are lost, retransmissions can take Also, if fragments are lost, retransmissions can take

alternate routes and get fragmented differently.alternate routes and get fragmented differently.

Keeping Track of Fragments 2Keeping Track of Fragments 2

Another way is to define elementary fragment Another way is to define elementary fragment size that can pass through every network.size that can pass through every network.

When packet fragmented, all pieces equal to When packet fragmented, all pieces equal to elementary fragment size, except last one elementary fragment size, except last one (may be smaller).(may be smaller).

Packet may contain several fragments.Packet may contain several fragments.

Keeping Track of Fragments 3Keeping Track of Fragments 3

Header contains packet number, number of first Header contains packet number, number of first fragment in the packet, and last-fragment bit.fragment in the packet, and last-fragment bit.

27 0 1 A B C D E F G H I J

27 0 0 A B C D E F G H 27 8 1 I J

Packet numberNumber offirst fragment

Last-fragment bit

(a) Original packetwith 10 data bytes.

(b) Fragments after passing through network with maximum packet size = 8 bytes.

1 byte

The Internet Network LayerThe Internet Network Layer

The Internet as a collection on networks or The Internet as a collection on networks or autonomous systems (ASs).autonomous systems (ASs).

Hierarchical structure.Hierarchical structure.

USbackbone

Europeanbackbone

Regionalnetwork

National network

Transcontinentallinks

Transcontinentallinks

IP (Internet Protocol)IP (Internet Protocol)

Glues Internet together.Glues Internet together. Common network-layer protocol spoken by all Common network-layer protocol spoken by all

Internet participating networks.Internet participating networks. Best effort datagram service:Best effort datagram service:

– No reliability guarantees.No reliability guarantees.– No ordering guarantees.No ordering guarantees.

IPIP

Transport layer breaks data streams into Transport layer breaks data streams into datagrams; fragments transmitted over datagrams; fragments transmitted over Internet, possibly being fragmented.Internet, possibly being fragmented.

When all packet fragments arrive at When all packet fragments arrive at destination, reassembled by network layer destination, reassembled by network layer and delivered to transport layer at and delivered to transport layer at destination host.destination host.

IP VersionsIP Versions

IPv4: IP version 4.IPv4: IP version 4.– Current, predominant version.Current, predominant version.– 32-bit long addresses.32-bit long addresses.

IPv6: IP version 6 (aka, IPng).IPv6: IP version 6 (aka, IPng).– Evolution of IPv4.Evolution of IPv4.– Longer addresses (16-byte long).Longer addresses (16-byte long).

IP Datagram FormatIP Datagram Format

IP datagram consists of header and data (or IP datagram consists of header and data (or payload).payload).

Header:Header:– 20-byte fixed (mandatory) part.20-byte fixed (mandatory) part.– Variable length optional part.Variable length optional part.

IP HeaderIP Header

32 bits

Version Headerlength

Type of service Total length

Identification Fragment offsetD M

TTL Protocol Header checksum

Source address

Destination address

Options

U

IP Header Fields 1IP Header Fields 1

Version: which IP version datagram uses.Version: which IP version datagram uses. Header length: how long (in 32-bit words) is header; Header length: how long (in 32-bit words) is header;

minimum=5; maximum=15 (options=40 bytes).minimum=5; maximum=15 (options=40 bytes). Type of service: precedence (priority), 3 flags (delay, Type of service: precedence (priority), 3 flags (delay,

throughput, reliability). In practice, routers ignore throughput, reliability). In practice, routers ignore type of service.type of service.

Total length: length of total datagram, i.e., header + Total length: length of total datagram, i.e., header + data (max = 64Kbytes).data (max = 64Kbytes).

IP Header Fields 2IP Header Fields 2

Identification: which datagram fragment Identification: which datagram fragment belongs to.belongs to.

U: unused bit.U: unused bit. D: don’t fragment.D: don’t fragment. M: more fragments.M: more fragments. Fragment offset: position of fragment in Fragment offset: position of fragment in

datagram.datagram. TTL: datagram lifetime.TTL: datagram lifetime.

IP Header Fields 3IP Header Fields 3

Protocol: number of the transport protocol Protocol: number of the transport protocol that generated the datagram.that generated the datagram.

Header checksum: verifies header integrity; Header checksum: verifies header integrity; computed at each hop.computed at each hop.

Source and destination address: IP Source and destination address: IP addresses of source and destination.addresses of source and destination.

Options: way of extending the protocol. Options: way of extending the protocol.

AddressingAddressing

Required for packet delivery.Required for packet delivery.– Each network may use different addressing Each network may use different addressing

scheme.scheme.– Addresses must be unique.Addresses must be unique.

Flat addresses: physical addresses (e.g., Flat addresses: physical addresses (e.g., Ethernet address).Ethernet address).

Hierarchical addresses: use hierarchy Hierarchical addresses: use hierarchy scheme like postal addresses (e.g., IP).scheme like postal addresses (e.g., IP).

Address TypesAddress Types

Unicast: uniquely distinguishes a single Unicast: uniquely distinguishes a single node.node.

Multicast: shared by a group of nodes.Multicast: shared by a group of nodes. Broadcast: shared by all nodes.Broadcast: shared by all nodes.

IP AddressesIP Addresses

Every host and router on the Internet must Every host and router on the Internet must have an IP address.have an IP address.

2-level hierarchy:2-level hierarchy:– Network number.Network number.– Host number.Host number.

Notations:Notations:– Binary: Binary: 10000000 00000110 11110000 0000001110000000 00000110 11110000 00000011

– Dotted decimal: 128.6.240.3Dotted decimal: 128.6.240.3

IP Address Formats 1IP Address Formats 1

4 different classes:4 different classes:

0XXXXXXX

Network Host

10XXXXXX XXXXXXXX

110XXXXX XXXXXXXX XXXXXXXX

1110XXXX XXXXXXXX XXXXXXXX XXXXXXXX

Class A:128 nets.16M hosts/net.Class B:16K nets.64K hosts/net.Class C:2M nets.256 hosts/net.Class D: Multicast.

IP Address Formats 2IP Address Formats 2

Class A: 1~127.Class A: 1~127. Class B: 128~191.Class B: 128~191. Class C: 192~223.Class C: 192~223. Class D: 224~239.Class D: 224~239.

Multi-addressesMulti-addresses

A router usually has more than one IP A router usually has more than one IP address.address.

Multi-homed host: host with multiple Multi-homed host: host with multiple network interfaces each of which has network interfaces each of which has different IP address.different IP address.

80.0.0.0

236.240.128.0129.98.0.0

129.98.95.1236.240.128.3

80.0.0.8

Management and Scalability 1Management and Scalability 1

Network numbers assigned by single Network numbers assigned by single authority: NIC (network information authority: NIC (network information center).center).

All hosts in a network must have same All hosts in a network must have same network number.network number.

What if networks grow?What if networks grow?

Management and Scalability 2Management and Scalability 2

Example: company starts with 1 class C Example: company starts with 1 class C LAN, thus can connect up to 256 hosts.LAN, thus can connect up to 256 hosts.– It might grow to more than 256 hosts.It might grow to more than 256 hosts.– It might get more LANs.It might get more LANs.– For every new LAN, need new network number For every new LAN, need new network number

from NIC.from NIC.– Moving machines between LANs needs address Moving machines between LANs needs address

change.change.

Subnetting 1Subnetting 1

Split address space into several “internal” Split address space into several “internal” subnets.subnets.– Still act like single network to outside world.Still act like single network to outside world.

Example: Class B address.Example: Class B address.

10XXXXXX XXXXXXXX HHHHHHHH HHHHHHHH

10XXXXXX XXXXXXXX SSSSSSHH HHHHHHHH

Class B:16K nets.64K hosts/net

Class B withsubnetting: 62LANs, 1022 hosts each.

1st. subnet: 130.50.4.12nd. subnet: 130.50.8.1

Subnetting 2Subnetting 2 Routing: hierarchical.Routing: hierarchical.

– (network, -) entries: distant networks hosts.(network, -) entries: distant networks hosts.– (this network, host) entries: local hosts.(this network, host) entries: local hosts.– Routers only need to keep track of other networks and Routers only need to keep track of other networks and

local hosts.local hosts. With subnetting:With subnetting:

– (network, -) entries: distant networks hosts.(network, -) entries: distant networks hosts.– (this network, subnet, -).(this network, subnet, -).– (this network, this subnet, host).(this network, this subnet, host).– Adds extra hierarchical level => smaller RTs.Adds extra hierarchical level => smaller RTs.

Subnet MaskSubnet Mask

Used to compute the subnet number; i.e., gets Used to compute the subnet number; i.e., gets rid of the host number.rid of the host number.– Facilitates routing table look-up.Facilitates routing table look-up.– IP address AND subnet mask = subnet #IP address AND subnet mask = subnet #

Example:Example:

10XXXXXX XXXXXXXX SSSSSSHH HHHHHHHH

11111111 11111111 11111100 00000000Ex: 130.50.15.6 AND subnet mask = 130.50.12.0,which is subnet 3.

Internet Control ProtocolsInternet Control Protocols

IP carries data.IP carries data. There are other network layer protocols that There are other network layer protocols that

carry control information.carry control information. Example: ICMP, ARP, RARP, BOOTP.Example: ICMP, ARP, RARP, BOOTP.

ICMPICMP

Internet Control Message Protocol.Internet Control Message Protocol. Report specific events.Report specific events.

– Generated by routers.Generated by routers.– Encapsulated in IP packets.Encapsulated in IP packets.

ICMP MessagesICMP Messages

Destination unreachable Packet couldn’t be deliveredTime exceeded TTL field hit 0Parameter problem Invalid header fieldSource quench Choke packetsRedirect Route problemEcho request Check if destination is upEcho reply Destination respondsTimestamp request Same as echo request + TSTimestamp reply Same as echo reply + TS

Mapping IP to DLL AddressMapping IP to DLL Address

Internet applications refer to hosts by their IP Internet applications refer to hosts by their IP addresses; once packet gets to destination addresses; once packet gets to destination LAN, node needs to figure out the destination LAN, node needs to figure out the destination DLL address.DLL address.

One solution is to have configuration file.One solution is to have configuration file.– Hard to maintain/update.Hard to maintain/update.

Address Resolution Protocol (ARP):Address Resolution Protocol (ARP):– Run by every node to map IP to DLL address Run by every node to map IP to DLL address

(RFC 826).(RFC 826).

ARPARP

Advantage: Advantage: – Easy to administer, less human intervention.Easy to administer, less human intervention.– Example: 2 hosts on the same Ethernet want to Example: 2 hosts on the same Ethernet want to

communicate.communicate.» Host 1 must figure out host 2’s Ethernet address.Host 1 must figure out host 2’s Ethernet address.

» Host 1 broadcasts ARP packet on Ethernet asking for Host 1 broadcasts ARP packet on Ethernet asking for the Ethernet address of host 2.the Ethernet address of host 2.

» Host 2 receives the ARP request, and replies with its Host 2 receives the ARP request, and replies with its Ethernet address.Ethernet address.

ARP OptimizationsARP Optimizations

Caching of ARP replies.Caching of ARP replies.– Entries may have large TTLs.Entries may have large TTLs.

When sending ARP request, piggyback its When sending ARP request, piggyback its own IP-DLL address mapping.own IP-DLL address mapping.

Every machine broadcasts its mapping at Every machine broadcasts its mapping at boot time.boot time.– No response is expected.No response is expected.– Other machines cache that information.Other machines cache that information.

Proxy ARPProxy ARP

What if host 1 wants to send data to host 3 What if host 1 wants to send data to host 3 on a different LAN?on a different LAN?– Router connecting the 2 LANs can be Router connecting the 2 LANs can be

configured to respond to ARP requests for the configured to respond to ARP requests for the networks it interconnects: proxy arp.networks it interconnects: proxy arp.

– Another solution is for host 1 to recognize host Another solution is for host 1 to recognize host 3 is on remote network and use default LAN 3 is on remote network and use default LAN address that handles all remote traffic; that address that handles all remote traffic; that could be the router’s Ethernet address. could be the router’s Ethernet address.

RARPRARP

Reverse Address Resolution Protocol.Reverse Address Resolution Protocol. Given LAN address, what’s the IP address?Given LAN address, what’s the IP address? Usually for booting diskless workstation.Usually for booting diskless workstation.

– Gets the OS image from remote file server.Gets the OS image from remote file server.– Same image for all machines.Same image for all machines.– Machine broadcasts its LAN address.Machine broadcasts its LAN address.– Remote RARP server responds with machine’s IP Remote RARP server responds with machine’s IP

address.address.

BOOTPBOOTP

RARP broadcasts are not forwarded by RARP broadcasts are not forwarded by routers. routers.

Need RARP server on every network.Need RARP server on every network. BOOTP uses UDP messages that are BOOTP uses UDP messages that are

forwarded by routers.forwarded by routers.– Also provides additional information such as IP Also provides additional information such as IP

address of file server holding OS image, subnet address of file server holding OS image, subnet mask, etc.mask, etc.

Internet RoutingInternet Routing

IGPs and EGPsIGPs and EGPs– IGPs: routing within ASs.IGPs: routing within ASs.– EGPs: routing between ASs.EGPs: routing between ASs.

IGPsIGPs

Original Internet IGP was RIP.Original Internet IGP was RIP.– Distance vector.Distance vector.

– OK for small ASs but not efficient as ASs got larger. OK for small ASs but not efficient as ASs got larger.

New IGP: OSPF.New IGP: OSPF.– Open Shortest Path First.Open Shortest Path First.

– Became standard in 1990.Became standard in 1990.

– Link state algorithm.Link state algorithm.

– RIP is still running but OSPF is taking over.RIP is still running but OSPF is taking over.

OSPF 1OSPF 1

Design requirements:Design requirements:– Open implementation.Open implementation.

– Support for various distance metrics: delay, hops, etc.Support for various distance metrics: delay, hops, etc.

– Dynamic: automatically adapt to topology changes.Dynamic: automatically adapt to topology changes.

– QoS Routing: real-time versus other traffic using IP’s type QoS Routing: real-time versus other traffic using IP’s type of service field.of service field.

– Load balancing across multiple lines.Load balancing across multiple lines.

– Security and tunneling.Security and tunneling.

OSPF 2OSPF 2

Abstracts collection of networks, routers and Abstracts collection of networks, routers and lines into a directed graph where edges are lines into a directed graph where edges are assigned a cost proportional to the routing assigned a cost proportional to the routing metric.metric.

It then computes shortest path.It then computes shortest path. Hierarchical routing within ASs.Hierarchical routing within ASs.

– Areas: collection of contiguous networks.Areas: collection of contiguous networks.– Area 0: AS backbone; all areas connected to it.Area 0: AS backbone; all areas connected to it.

OSPF 3OSPF 3

Type of service routing:Type of service routing:– Uses different graphs labeled with different Uses different graphs labeled with different

metrics.metrics. Routing updates:Routing updates:

– Adjacent routersAdjacent routers exchange routing information. exchange routing information.– Adjacent routers are on different LANs.Adjacent routers are on different LANs.– Reliable link state updates with sequence #’s.Reliable link state updates with sequence #’s.

EGPsEGPs

Routing protocol between ASs.Routing protocol between ASs. Take policy into account.Take policy into account.

– An AS may not be willing to carry traffic An AS may not be willing to carry traffic originating and destined to foreign ASs.originating and destined to foreign ASs.

– Example: phone companies are willing to carry Example: phone companies are willing to carry traffic for their customers but not for others.traffic for their customers but not for others.

Routing Policy ExamplesRouting Policy Examples

No transit traffic through certain ASs.No transit traffic through certain ASs. Traffic source restricts ASs through which Traffic source restricts ASs through which

its traffic crosses.its traffic crosses. Same for destination.Same for destination.

BGP 1BGP 1

Border Gateway Protocol.Border Gateway Protocol. Policies are manually configured into BGP Policies are manually configured into BGP

routers.routers. BGP abstracts networks as a collection of BGP abstracts networks as a collection of

BGP routers and the their links.BGP routers and the their links. 2 BGP routers are connected if they share a 2 BGP routers are connected if they share a

common network.common network. BGP routers communicate reliably using TCP.BGP routers communicate reliably using TCP.

BGP 2BGP 2

3 types of networks:3 types of networks:– Stub networks: have a single connection in the Stub networks: have a single connection in the

BGP graph; cannot carry transit traffic.BGP graph; cannot carry transit traffic.– Multi-connected networks: have multiple Multi-connected networks: have multiple

connections but refuse to carry transit traffic.connections but refuse to carry transit traffic.– Transit networks: agree to carry transit (3rd. Transit networks: agree to carry transit (3rd.

party) traffic possibly with some restriction; party) traffic possibly with some restriction; e.g., backbones. e.g., backbones.

BGP 3BGP 3

BGP is a distance vector protocol.BGP is a distance vector protocol. Routing table entries keep whole path to Routing table entries keep whole path to

destination + distance.destination + distance. BGP routers can discard the paths containing BGP routers can discard the paths containing

itself: avoiding loops and counting to infinity.itself: avoiding loops and counting to infinity. Routers compute distance associated to a route Routers compute distance associated to a route

taking policy into account.taking policy into account.– If policy is violated, distance = infinity.If policy is violated, distance = infinity.

Internet MulticastingInternet Multicasting

IP supports multicasting using class D IP supports multicasting using class D addresses.addresses.– Each class D address identifies a group of Each class D address identifies a group of

hosts.hosts.– 28 bits define over 250 million groups.28 bits define over 250 million groups.

Best-effort delivery.Best-effort delivery.

Group MembershipGroup Membership

Hosts (single or multiple processes) may join Hosts (single or multiple processes) may join and leave group.and leave group.

Special, multicast routers perform multicast Special, multicast routers perform multicast routing and packet forwarding.routing and packet forwarding.– Hosts belonging to multicast groups periodically Hosts belonging to multicast groups periodically

send messages to the closest multicast router.send messages to the closest multicast router.– Multicast routers and hosts use IGMP (Internet Multicast routers and hosts use IGMP (Internet

Group Management Protocol) to exchange Group Management Protocol) to exchange membership information.membership information.

IP Multicast RoutingIP Multicast Routing Use spanning trees.Use spanning trees. Modified distance vector protocol using Modified distance vector protocol using

unicast routing information.unicast routing information.– Build one spanning tree per source, per group.Build one spanning tree per source, per group.– Or, one shared spanning tree per group.Or, one shared spanning tree per group.– Use pruning to remove parts of the tree that don’t Use pruning to remove parts of the tree that don’t

have any multicast group members.have any multicast group members.– Use tunneling to cross regions that are not Use tunneling to cross regions that are not

multicast capable.multicast capable.

Mobile IP 1Mobile IP 1

Support for mobile users.Support for mobile users.– ““Last hop” mobility.Last hop” mobility.

Problem: IP addressing scheme.Problem: IP addressing scheme.– Class+network number+host number.Class+network number+host number.– If host moves and attaches itself to foreign If host moves and attaches itself to foreign

network, packets destined to it will still go to its network, packets destined to it will still go to its home network.home network.

– Assigning hosts new IP address?Assigning hosts new IP address?» Too much hassle.Too much hassle.

Mobile IP 2Mobile IP 2

Solution:Solution:– Home agent: runs at the home network.Home agent: runs at the home network.– Foreign agent: runs at foreign network.Foreign agent: runs at foreign network.– When mobile host connects itself to foreign When mobile host connects itself to foreign

network, registers with foreign network’s network, registers with foreign network’s foreign agent.foreign agent.

– Foreign agent assigns host Foreign agent assigns host care-of addresscare-of address, and , and informs home agent.informs home agent.

Mobile IP 3Mobile IP 3

Sending packets: mobile host uses its care-of Sending packets: mobile host uses its care-of address.address.

Receiving packets: Receiving packets: – When packet arrives at home network, router that gets it When packet arrives at home network, router that gets it

sends ARP request for that IP address.sends ARP request for that IP address.– Home agent replies with its own Ethernet address. It gets Home agent replies with its own Ethernet address. It gets

the packet, and tunnels it to foreign agent. Foreign agent the packet, and tunnels it to foreign agent. Foreign agent delivers packet to mobile host.delivers packet to mobile host.

– Home agent sends care-of address to sender, so future Home agent sends care-of address to sender, so future packets are sent directly to foreign network.packets are sent directly to foreign network.

Mobile IP 4Mobile IP 4

Locating foreign agents:Locating foreign agents:– Foreign agents periodically broadcast their address and Foreign agents periodically broadcast their address and

service provided (e.g., home, foreign, or both).service provided (e.g., home, foreign, or both).– Mobile host can announce its presence and wait for Mobile host can announce its presence and wait for

response from foreign agent.response from foreign agent.

Unregistration:Unregistration:– If host leaves without unregistering, its registration expires If host leaves without unregistering, its registration expires

after some time.after some time.

Security:Security:– Authentication issues.Authentication issues.

Scaling IP Addresses 1Scaling IP Addresses 1

Exponential growth of the Internet!Exponential growth of the Internet!– 32-bit address fields are getting too small.32-bit address fields are getting too small.– Early predictions: it’d take decades to achieve Early predictions: it’d take decades to achieve

100,000 network mark.100,000 network mark.– 100,000th. network was connected in 1996!100,000th. network was connected in 1996!– Internet is rapidly running out of IP addresses!Internet is rapidly running out of IP addresses!– Waste due to hierarchical address. Waste due to hierarchical address.

IP Address Formats IP Address Formats

4 different classes:4 different classes:

0XXXXXXX

Network Host

10XXXXXX XXXXXXXX

110XXXXX XXXXXXXX XXXXXXXX

1110XXXX XXXXXXXX XXXXXXXX XXXXXXXX

Class A:128 nets.16M hosts/net.Class B:16K nets.64K hosts/net.Class C:2M nets.256 hosts/net.Class D: Multicast.

Scaling IP Addresses 2 Scaling IP Addresses 2

Class A addresses: 16M hosts is usually too Class A addresses: 16M hosts is usually too much.much.

Class C addresses: 254 hosts is usually too small.Class C addresses: 254 hosts is usually too small. Class B addresses provide room for 64K hosts.Class B addresses provide room for 64K hosts.

– Organizations usually request class B addresses but Organizations usually request class B addresses but more than 50% of them only have up to 50 hosts!more than 50% of them only have up to 50 hosts!

Scaling IP Addresses 3Scaling IP Addresses 3

Class C addresses should have 10-bit host Class C addresses should have 10-bit host numbers instead of only 8-bit numbers.numbers instead of only 8-bit numbers.– Would allow for 1022 hosts instead of just 254.Would allow for 1022 hosts instead of just 254.– More Class C networks: network number can More Class C networks: network number can

grow up to 0.5M.grow up to 0.5M. But, could result in routing table explosion.But, could result in routing table explosion.

– Routers will have to know about many more Routers will have to know about many more networks.networks.

CIDR 1CIDR 1

Classless Interdomain Routing: RFC 1519.Classless Interdomain Routing: RFC 1519. No longer uses classes A, B, and C addresses.No longer uses classes A, B, and C addresses. Allocate remaining Class C addresses in Allocate remaining Class C addresses in

variable-sized blocks.variable-sized blocks.– Example: if an organization needs 2000 addresses, Example: if an organization needs 2000 addresses,

it’s given a block of 2048 addresses, or 8 it’s given a block of 2048 addresses, or 8 contiguous class C networks and not a full class B contiguous class C networks and not a full class B address.address.

CIDR 2CIDR 2

New allocation rules for class C addresses.New allocation rules for class C addresses. World partitioned into 4 zones and each one was World partitioned into 4 zones and each one was

given portion of class C address space (192~223).given portion of class C address space (192~223).– 192.0.0.0~195.255.255.255: Europe.192.0.0.0~195.255.255.255: Europe.

– 198.0.0.0~199.255.255.255: North America.198.0.0.0~199.255.255.255: North America.

– 200.0.0.0~201.255.255.255: Central and South America.200.0.0.0~201.255.255.255: Central and South America.

– 202.0.0.0~203.255.255: Asia and Pacific.202.0.0.0~203.255.255: Asia and Pacific.

CIDR 3CIDR 3

Each region is allocated ~ 32M class C Each region is allocated ~ 32M class C addresses.addresses.

Addresses 204.0.0.0~223.255.255.255 Addresses 204.0.0.0~223.255.255.255 reserved for future use.reserved for future use.

Advantages:Advantages:– Less waste.Less waste.– Routers can keep only one RT entry per region, Routers can keep only one RT entry per region,

i.e., 32M addresses compressed into one.i.e., 32M addresses compressed into one.

CIDR 4CIDR 4

Once packet gets to its destination region, Once packet gets to its destination region, need more detailed routing information.need more detailed routing information.

One possibility is to keep 131,072 (32M/2One possibility is to keep 131,072 (32M/288) ) entries for all “local” networks.entries for all “local” networks.– Explosion problem.Explosion problem.

Instead, use of 32-bit masks: only need to Instead, use of 32-bit masks: only need to keep start address of block.keep start address of block.

CIDR - Example 1CIDR - Example 1

Cambridge University has 2048 addresses from Cambridge University has 2048 addresses from 194.24.0.0~194.24.7.255 and mask 255.255.248.0.194.24.0.0~194.24.7.255 and mask 255.255.248.0.

Oxford University: 4096 addresses Oxford University: 4096 addresses 194.24.16.0~194.24.31.255 with mask 194.24.16.0~194.24.31.255 with mask 255.255.240.0.255.255.240.0.

U of Edinburgh: 1024 addresses U of Edinburgh: 1024 addresses 194.24.8.0~194.24.11.255 and mask 255.255.252.0.194.24.8.0~194.24.11.255 and mask 255.255.252.0.

IP EvolutionIP Evolution

CIDR bought IPv4 a few more years.CIDR bought IPv4 a few more years. Because of its addressing limitations and to Because of its addressing limitations and to

accommodate next-generation Internet accommodate next-generation Internet applications, IP must evolve.applications, IP must evolve.

In 1990, IETF started work on IP next In 1990, IETF started work on IP next generation, or IPng.generation, or IPng.– Several proposals were considered.Several proposals were considered.– SIPP (Simple Internet Protocol Plus) was selected SIPP (Simple Internet Protocol Plus) was selected

and became IPv6.and became IPv6.

IPv6 1IPv6 1

RFCs 1883~1887.RFCs 1883~1887. Features:Features:

– Longer addresses (16 bytes versus only 4 in IPv4).Longer addresses (16 bytes versus only 4 in IPv4).– Header simplification (only 7 fields versus 13 Header simplification (only 7 fields versus 13

fields in IPv4): faster processing by routers.fields in IPv4): faster processing by routers.– Better option support since fields that were Better option support since fields that were

previously required are now optional.previously required are now optional.– Improved security and QoS support.Improved security and QoS support.

IPv6 HeaderIPv6 Header

32 bits

Version Priority Flow label

Next header Hop limitPayload length

Source address(16 bytes)

Destination address(16 bytes)

IPv6 Header Fields 1IPv6 Header Fields 1 Version = 6.Version = 6.

– During transition period, routers will examine this field to During transition period, routers will examine this field to decide what kind of packet it is.decide what kind of packet it is.

Priority: handling different kinds of traffic. Priority: handling different kinds of traffic. – 0~7: data that can be flow controlled, e.g., data distribution 0~7: data that can be flow controlled, e.g., data distribution

services.services.

– 8~15: real-time traffic (e.g., audio, video)8~15: real-time traffic (e.g., audio, video)

– Within each group, lower values have lower priority than Within each group, lower values have lower priority than higher values (e.g., 1 for news, 4 for ftp and 6 for telnet)higher values (e.g., 1 for news, 4 for ftp and 6 for telnet)

IPv6 Header Fields 2IPv6 Header Fields 2

Flow label (experimental): allows source and Flow label (experimental): allows source and destination to set up pseudo-connection.destination to set up pseudo-connection.– Try to have some kind of service guarantees.Try to have some kind of service guarantees.– Example: assign flow number to a stream of Example: assign flow number to a stream of

packets that need reserved bandwidth.packets that need reserved bandwidth.– Flow number: src+dst+flow #.Flow number: src+dst+flow #.

Payload length: length of data.Payload length: length of data.– Different from IPv4 which specified total length Different from IPv4 which specified total length

of datagram.of datagram.

IPv6 Header Fields 3IPv6 Header Fields 3

Next header: specifies what is present in the Next header: specifies what is present in the options field (extension headers).options field (extension headers).

Hop limit: equivalent to IPv4’s TTL.Hop limit: equivalent to IPv4’s TTL. Source and destination addresses:Source and destination addresses:

– 16-byte addresses (fixed length).16-byte addresses (fixed length).– Address space is divided by using prefixes.Address space is divided by using prefixes.

IPv6 versus IPv4IPv6 versus IPv4

No more IHL (header length); why?No more IHL (header length); why? No more No more protocolprotocol field: field: next headernext header field. field. No more fragmentation-related fields.No more fragmentation-related fields.

– All IPv6 hosts and routers must support 576-byte packets.All IPv6 hosts and routers must support 576-byte packets.

– Fragmentation is less likely to occur.Fragmentation is less likely to occur.

– Router sends error messages back to source when packet is Router sends error messages back to source when packet is too big so source breaks it down.too big so source breaks it down.

No more checksum: rely on more reliable networks No more checksum: rely on more reliable networks and DLL and transport checksums.and DLL and transport checksums.

IPv6 Addressing 1IPv6 Addressing 1

Separate prefixes for provider-based and geographic-Separate prefixes for provider-based and geographic-based addresses.based addresses.– Ability to accommodate 2 ways of address assignment: Ability to accommodate 2 ways of address assignment:

» Addresses allocated to ISP companies.Addresses allocated to ISP companies. Prefix 010.Prefix 010. Each ISP assigned portion of address space.Each ISP assigned portion of address space. First 5 bits following prefix defines registry where provider is First 5 bits following prefix defines registry where provider is

registered.registered. Remaining 15 bytes are allocated by each provider.Remaining 15 bytes are allocated by each provider. Example: 3-byte provider number.Example: 3-byte provider number.

IPv6 Addressing 2IPv6 Addressing 2 Geographic-based addresses:Geographic-based addresses:

– Prefix 100.Prefix 100.– Same model as current Internet.Same model as current Internet.

Multicast addresses:Multicast addresses:– Prefix 11111111.Prefix 11111111.– 4-bit flag + 4-bit scope fields + 112-bit group id.4-bit flag + 4-bit scope fields + 112-bit group id.– Flags: 1 bit defines whether group is permanent or Flags: 1 bit defines whether group is permanent or

not.not.– Scope: limit reach of multicast packet.Scope: limit reach of multicast packet.

IPv6 Address NotationIPv6 Address Notation

8 groups of 4 hexadecimal digits separated 8 groups of 4 hexadecimal digits separated by colons.by colons.– Example: Example:

8000:0000:0000:0000:0123:4567:89AB:CDEF8000:0000:0000:0000:0123:4567:89AB:CDEF– Optimizations:Optimizations:

» Leading zeros within group can be omitted.Leading zeros within group can be omitted.

» Groups of zeros can be replaced by pair of colons.Groups of zeros can be replaced by pair of colons. 8000::123:4567:89AB:CDEF.8000::123:4567:89AB:CDEF.

» IPv4 addresses: ::192.31.20.46.IPv4 addresses: ::192.31.20.46.

Extension Headers 1Extension Headers 1

Equivalent to IPv4 options.Equivalent to IPv4 options. 6 types of extension headers:6 types of extension headers:

Hop-by-hop optionsHop-by-hop options Misc. info for routersMisc. info for routers

RoutingRouting Full or partial route includedFull or partial route included

FragmentationFragmentation Management of fragmentsManagement of fragments

AuthenticationAuthentication Verification of source’s idVerification of source’s id

Encrypted payloadEncrypted payload Information about encryptionInformation about encryption

Destination optionsDestination options Information for destinationInformation for destination

Extension Headers 2Extension Headers 2

Fixed format and variable-sized headers.Fixed format and variable-sized headers. Variable-sized headers:Variable-sized headers:

– (type, length, value).(type, length, value).

– Type: 1 byte specifying which option this is.Type: 1 byte specifying which option this is.» First 2 bits tell option-uncapable routers what to do: skip option, First 2 bits tell option-uncapable routers what to do: skip option,

discard packet, discard packet with ICMP message, discard packet discard packet, discard packet with ICMP message, discard packet without ICMP packet for multicast addresses.without ICMP packet for multicast addresses.

– Length: how long value field (0~255 bytes).Length: how long value field (0~255 bytes).

– Value: information.Value: information.

Hop-by-Hop HeaderHop-by-Hop Header

Convey information all routers along path Convey information all routers along path must examine.must examine.– Jumbograms: datagrams > 64KBytes.Jumbograms: datagrams > 64KBytes.

– Next header: what option this is.Next header: what option this is.

– Length of hop-by-hop header excluding the first 8 Length of hop-by-hop header excluding the first 8 (mandatory) bytes.(mandatory) bytes.

– Defines option, in this case datagram size.Defines option, in this case datagram size.

Next Header 0 194 0

Jumbogram payload length

Routing HeaderRouting Header

Lists one or more routers that must be Lists one or more routers that must be visited on the way to the destination.visited on the way to the destination.– Strict source routing: full path is supplied.Strict source routing: full path is supplied.– Loose source routing: only selected routers are Loose source routing: only selected routers are

listed.listed.

Fragment HeaderFragment Header

Allows source to fragment datagram.Allows source to fragment datagram.– In IPv6, routers are not allowed to fragment.In IPv6, routers are not allowed to fragment.– If a router receives packet that is too big, it If a router receives packet that is too big, it

discards it and sends back a ICMP message to discards it and sends back a ICMP message to source.source.

– Source uses this option to fragment packet, and Source uses this option to fragment packet, and resend it.resend it.

– Contains datagram id, fragment number, and Contains datagram id, fragment number, and “last fragment” bit.“last fragment” bit.

Authentication HeaderAuthentication Header

Supports verification of sender’s identity.Supports verification of sender’s identity. Contains authentication key and Contains authentication key and

cryptographic checksum of the whole cryptographic checksum of the whole datagram.datagram.

Receiver uses key number to find secret Receiver uses key number to find secret key. Computes checksum using secret key key. Computes checksum using secret key and checks whether it matches with and checks whether it matches with received datagram.received datagram.

Destination OptionsDestination Options

Supports options that need only be Supports options that need only be interpreted by destination host.interpreted by destination host.

Quality of Service Quality of Service

Service offered by the network (carrier) to customer Service offered by the network (carrier) to customer (end user): service agreement.(end user): service agreement.

Service agreement: offered traffic, offered service, Service agreement: offered traffic, offered service, compliance requirements.compliance requirements.

If customer and carrier don’t agree: VC will not be If customer and carrier don’t agree: VC will not be set up.set up.

Different requirements for each direction.Different requirements for each direction.– E.g., VOD application: required bandwidth user->server E.g., VOD application: required bandwidth user->server

<> server->user.<> server->user.

Quality of Service Parameters 1Quality of Service Parameters 1

Peak cell rate PCR Max. cell transmission rateSustained cell rate SCR Average cell rateMinimum cell rate MCR Min. acceptable cell rateCell delay variation tolerance CDVT Max. acceptable cell jitterCell loss ratio CLR Fraction of lost cellsCell transfer delay CTD Time to deliverCell delay variation CDV Delivery delay variationCell error rate CER Fraction of correct cells

QoS Parameters 2QoS Parameters 2

PCR, SCR, MCR, and CVDT: specified by PCR, SCR, MCR, and CVDT: specified by sender.sender.

CLR, CTD, and CDV describe network CLR, CTD, and CDV describe network conditions and are measured at receiver.conditions and are measured at receiver.

The Transport LayerThe Transport Layer

The Transport LayerThe Transport Layer

End-to-end.End-to-end.– Communication from source to destination Communication from source to destination

host.host.– Only hosts run transport-level protocols.Only hosts run transport-level protocols.– Under user’s control as opposed to network Under user’s control as opposed to network

layer which is controlled/owned by carrier.layer which is controlled/owned by carrier.

The Transport ServiceThe Transport Service

Service provided to application layer.Service provided to application layer. Transport entity: process that implements Transport entity: process that implements

the transport protocol running on a host.the transport protocol running on a host.– At OS kernel, user-level process, or network At OS kernel, user-level process, or network

card.card.

The Transport LayerThe Transport Layer

TransportEntity

ApplicationLayer

Network Layer

Transportaddress

NetworkAddress

Transport/networkinterface

Application/transportinterface Transport

Entity

ApplicationLayer

Network Layer

TPDU

Source host Destination host

Types of Transport ServicesTypes of Transport Services

Connection-less versus connection-oriented.Connection-less versus connection-oriented. Connection-less service: no logical Connection-less service: no logical

connections, no flow or error control.connections, no flow or error control. Connection-oriented: Connection-oriented:

– Based on logical connections: connection setup, Based on logical connections: connection setup, data transfer, connection teardown.data transfer, connection teardown.

– Flow and error control.Flow and error control.

Transport versus NetworkTransport versus NetworkLayerLayer

Transport layer is “controlled” by user.Transport layer is “controlled” by user.– Ability to enhance network layer quality of Ability to enhance network layer quality of

service.service.– Example: transport service can be more reliable Example: transport service can be more reliable

than underlying network service.than underlying network service.– Transport layer makes standard set of Transport layer makes standard set of

primitives available to users which are primitives available to users which are independent from the network service independent from the network service primitives, which may vary considerably.primitives, which may vary considerably.

Quality of ServiceQuality of Service

User may specify QoS parameters at then User may specify QoS parameters at then transport layer.transport layer.– At connection setup time, user may define At connection setup time, user may define

preferred, acceptable, and minimum values for preferred, acceptable, and minimum values for various service parameters.various service parameters.

– Transport layer determines whether it’s Transport layer determines whether it’s possible to provide required service based on possible to provide required service based on available network service(s).available network service(s).

Transport-Layer QoS Parameters Transport-Layer QoS Parameters 11

Connection establishment delay: time to Connection establishment delay: time to establish connection.establish connection.

Connection establishment failure Connection establishment failure probability: probability connection is not probability: probability connection is not established within maximum establishment established within maximum establishment time.time.

Throughput: bytes transferred per second Throughput: bytes transferred per second measured over a time interval.measured over a time interval.

Transport-Layer QoS Parameters Transport-Layer QoS Parameters 22

Transit delay: time between sending a Transit delay: time between sending a message and receiving it on the other side message and receiving it on the other side (measured by the transport entities).(measured by the transport entities).

Residual error ratio: ratio of messages in error Residual error ratio: ratio of messages in error to total messages sent.to total messages sent.

Priority: way for user to indicate that some Priority: way for user to indicate that some connections are more important.connections are more important.

Resilience: probability connection is Resilience: probability connection is terminated due to congestion, etc. terminated due to congestion, etc.

Transport Layer QoSTransport Layer QoS

Only few transport protocols provide QoS Only few transport protocols provide QoS parameters. parameters.

Most just try to minimize residual error rate.Most just try to minimize residual error rate. QoS parameters specified by transport user QoS parameters specified by transport user

when connection is setup.when connection is setup.– Desired and minimum acceptable values can be Desired and minimum acceptable values can be

specified. specified. – Service negotiation.Service negotiation.

Transport Service PrimitivesTransport Service Primitives

Allow transport users (e.g., application Allow transport users (e.g., application programs) to access transport service.programs) to access transport service.

Example: connection-oriented transport service Example: connection-oriented transport service primitives.primitives.PRIMITIVEPRIMITIVE TPDU SentTPDU Sent Meaning MeaningLISTENLISTEN (none) (none) listen for connection listen for connectionCONNECTCONNECT Connection Req. try to establish connection Connection Req. try to establish connection

SENDSEND DATA DATA send data send dataRECEIVERECEIVE (none)(none) waits for data waits for dataDISCONNECTDISCONNECT Disc. Req.Disc. Req. try to release connection try to release connection

TPDUTPDU

Transport protocol data unit.Transport protocol data unit. Messages sent between transport entities.Messages sent between transport entities. TPDUs contained in network-layer packets, TPDUs contained in network-layer packets,

which in turn are contained in DLL frames.which in turn are contained in DLL frames.

Frameheader

Packetheader

TPDUheader TPDU payload

Connection Management State Connection Management State MachineMachine

Established

Idle

Activeestablishmentpending

Activedisconnectpending

Idle

Passiveestablishmentpending

Passivedisconnectpending

Connectexecuted

ConnectionAccept

SERVER CLIENTConnection req. received

Connectexecuted

Disc. req.received

s

Disconnectexecuted

Disconnectexecute

Disc. accept. received

Berkeley Sockets 1Berkeley Sockets 1

Set of transport-level primitives made available by Set of transport-level primitives made available by Berkeley UNIX. Berkeley UNIX.

Server side: Server side: » SOCKET: create new communication end point.SOCKET: create new communication end point.

» BIND: attach local address to socket (once server binds address, BIND: attach local address to socket (once server binds address, clients can connect to it).clients can connect to it).

» LISTEN: listen for connection.LISTEN: listen for connection.

» ACCEPT: accept new connection.ACCEPT: accept new connection.

» SEND, RECEIVE: send and receive data.SEND, RECEIVE: send and receive data.

» CLOSE: release connection.CLOSE: release connection.

Berkeley Sockets 2Berkeley Sockets 2

Client side:Client side:» SOCKET: create socket.SOCKET: create socket.

» CONNECT: try to establish connection.CONNECT: try to establish connection.

» SEND, RECEIVE: send and receive data.SEND, RECEIVE: send and receive data.

» CLOSE: release connection. CLOSE: release connection.

Transport Protocol Issues: Transport Protocol Issues: AddressingAddressing

Address of the transport-level entity.Address of the transport-level entity. TSAP: transport service access point TSAP: transport service access point

(analogous to NSAP).(analogous to NSAP).– Internet TSAP: (IP address, local port).Internet TSAP: (IP address, local port).– Internet NSAP: IP address.Internet NSAP: IP address.– There may be multiple TSAPs on one host.There may be multiple TSAPs on one host.– Typically, only one NSAP.Typically, only one NSAP.

Example 1Example 1

Finding the time of day from a time-of-day Finding the time of day from a time-of-day server.server.– Time-of-day server process on host 2 attaches Time-of-day server process on host 2 attaches

itself to TSAP 122 and waits for requests (e.g., itself to TSAP 122 and waits for requests (e.g., through LISTEN).through LISTEN).

– Application process (TSAP 6) on host 1 wants Application process (TSAP 6) on host 1 wants to find out the time-of-day; issues CONNECT to find out the time-of-day; issues CONNECT specifying TSAP 6 as source and TSAP 122 as specifying TSAP 6 as source and TSAP 122 as destination.destination.

Finding Services 1Finding Services 1

Well-known TSAP.Well-known TSAP.– Time-of-day server has been using TSAP 122 forever so Time-of-day server has been using TSAP 122 forever so

every users know it.every users know it.

Initial connection protocol: special Initial connection protocol: special process process serverserver that proxies for less well-known that proxies for less well-known services.services.– Process server listens to set of ports at the same time.Process server listens to set of ports at the same time.– Users CONNECT to a TSAP, and if there are no servers, Users CONNECT to a TSAP, and if there are no servers,

process server is likely to be listening. It them spawns process server is likely to be listening. It them spawns requested server.requested server.

Finding Services 2Finding Services 2

Name or directory service.Name or directory service.– Name server listens to well-known TSAP.Name server listens to well-known TSAP.– User sends service name and name server User sends service name and name server

responds with service’s TSAP.responds with service’s TSAP.– New services need to register with name server.New services need to register with name server.

Finding the server’s network address.Finding the server’s network address.– Hierarchical addresses solve this problem, i.e., the Hierarchical addresses solve this problem, i.e., the

NSAP is part of the TSAP.NSAP is part of the TSAP.

Connection EstablishmentConnection Establishment

CONNECTION REQUEST and CONNECTION CONNECTION REQUEST and CONNECTION ACCEPTED TPDUs.ACCEPTED TPDUs.

Problem: delayed duplicates.Problem: delayed duplicates.– Duplicates can re-appear and be taken as the real Duplicates can re-appear and be taken as the real

messages.messages.

Solution: messages age and are discarded after some Solution: messages age and are discarded after some time; need to discard ack’s.time; need to discard ack’s.– Maximum hop count.Maximum hop count.

– Timestamp.Timestamp.

Avoiding Duplicates 1Avoiding Duplicates 1

2 identically numbered TPDUs are never 2 identically numbered TPDUs are never outstanding at the same time.outstanding at the same time.

Bounded packet lifetime.Bounded packet lifetime. Each host has its clock.Each host has its clock.

– Clock as a counter that increments itself.Clock as a counter that increments itself.– #bits(counter)>= #bits(sequence number).#bits(counter)>= #bits(sequence number).– Clocks don’t “crash”.Clocks don’t “crash”.

Avoiding Duplicates 2Avoiding Duplicates 2

When connection setup, low-order When connection setup, low-order kk bits of bits of clock used as initial sequence number.clock used as initial sequence number.

Each connection starts numbering its Each connection starts numbering its TPDUs with different sequence number.TPDUs with different sequence number.

Sequence number space need to be such Sequence number space need to be such that by the time sequence numbers wrap that by the time sequence numbers wrap around, old TPDUs with same sequence around, old TPDUs with same sequence numbers have aged.numbers have aged.

Sequence Numbers versus Time Sequence Numbers versus Time 11

Seq.#’s

Time

. Linear relation between timeand initial sequence number.

Sequence Numbers versus Time Sequence Numbers versus Time 22

Seq.#’s

Time

. Host crash: when it comes up, it doesn’t know where it ere in the sequence # space.

T

Forbiddenregion

. Example: T=60 sec and clock ticks once per second.. At t=30s, TPDU on connection5 gets seq.# 80.

. Host crashes and comes up.

. At t=60s, reopens connections 0~4.

. At t=70s, reopens connection 5 and at t=80s, sends TPDU 80.

. Old TPDU 80 still valid, and one would look like a duplicate.

. To prevent this, check if it’s in the “forbidden region” and delay sequence number.

Three-Way HandshakeThree-Way Handshake

Solves the problem of getting 2 sides to Solves the problem of getting 2 sides to agree on initial sequence number.agree on initial sequence number.

CR (seq=x)

ACK(seq=y,ACK=x)

DATA(seq=x, ACK=y)

CR: connectionrequest.

1 2

3-Way Handshake: Duplicates 13-Way Handshake: Duplicates 1

. Old duplicate CR.

. The ACK from host 2 triesto verify if host 1 was trying to open a new connection with seq=x.. Host 1 rejects host 2’s attempt to establish.Host 2 realizes it was a duplicateCR and aborts connection.

CR(seq=x)*

ACK(seq=y, ACK=x)

REJECT(ACK=y)

1 2

3-Way Handshake: Duplicates 23-Way Handshake: Duplicates 2

. Old duplicate CR and ACKto connection accepted.

CR(seq=x)*

ACK(seq=y, ACK=x)

REJECT(ACK=y)

1 2

DATA(seq=x,ACK=z)

Connection ReleaseConnection Release

Asymmetric release: telephone system.Asymmetric release: telephone system.– When one party hangs up, connection breaks.When one party hangs up, connection breaks.– May cause data loss.May cause data loss.

Symmetric release: Symmetric release: – Treats connection as 2 separate unidirectional Treats connection as 2 separate unidirectional

connections.connections.– Requires each to be released separately.Requires each to be released separately.

Symmetric ReleaseSymmetric Release

How to determine when all data has been How to determine when all data has been sent and connection could be released?sent and connection could be released?

2-army problem:2-army problem:

Blue army 1

White army

Blue army 2

. White army largerthan either blue armies.. Blue army together is larger.. If each blue army attacks, it’ll be defeated. They win if attack together.

2-Army Problem 12-Army Problem 1

To synchronize attack, they must use messengers that To synchronize attack, they must use messengers that need to cross valley: unreliable.need to cross valley: unreliable.

Is there a protocol that allows blue army to win? No.Is there a protocol that allows blue army to win? No.– Blue army 1 sends message to blue army 2.Blue army 1 sends message to blue army 2.

– Blue army 2 sends ACK back.Blue army 2 sends ACK back.

– Blue army 2 is not sure whether ACK was received.Blue army 2 is not sure whether ACK was received.

2-Army Problem 22-Army Problem 2

Use 2-way handshake.Use 2-way handshake.– Blue army 1 ACKs back but it’ll never know if Blue army 1 ACKs back but it’ll never know if

the ACK was received.the ACK was received. Applying to connection release:Applying to connection release:

– Neither side is prepared to disconnect until Neither side is prepared to disconnect until convince other side is prepared to disconnect.convince other side is prepared to disconnect.

– In practice, hosts are willing to take risks. In practice, hosts are willing to take risks.

Connection Release ProtocolConnection Release Protocol

DR

DR

ACK

DR: disconnectionrequest.

Send DR+start timer

Send DR+start timerRelease

connection

Send ACK Release

connection

Connection Release Scenarios 1 Connection Release Scenarios 1

DR

DR

ACK

DR: disconnectionrequest.

Send DR+start timer

Send DR+start timerRelease

connection

Send ACK Timeout:

Release connection

Connection Release Scenarios 2 Connection Release Scenarios 2

DR

DR

DR: disconnectionrequest.

Send DR+start timer

Send DR+start timerTimeout:

send DR+start timer

Release connection

DR

Send DR+start timerDR

ACK

The Internet Transport Protocols: The Internet Transport Protocols: TCP and UDPTCP and UDP

UDP: user datagram protocol (RFC 768).UDP: user datagram protocol (RFC 768).– Connection-less protocol.Connection-less protocol.

TCP: transmission control protocol (RFCs TCP: transmission control protocol (RFCs 793, 1122, 1323).793, 1122, 1323).– Connection-oriented protocol.Connection-oriented protocol.

UDPUDP

Provides connection-less, unreliable service.Provides connection-less, unreliable service.– No delivery guarantees.No delivery guarantees.– No ordering guarantees.No ordering guarantees.– No duplicate detection.No duplicate detection.

Low overhead.Low overhead.– No connection establishment/teardown.No connection establishment/teardown.

Suitable for short-lived connections.Suitable for short-lived connections.– Example: client-server applications. Example: client-server applications.

UDP Segment FormatUDP Segment Format

0 15 31

Source port Destination port

Length Checksum

Data

Source and destination ports: identify the end points.Length: 8-byte header+ data.Checksum: optional; if not used, set to zero.

UDP ChecksumUDP Checksum

Computed over a Computed over a pseudo-headerpseudo-header+ UDP + UDP header+data+padding (to even number of header+data+padding (to even number of bytes if needed).bytes if needed).

Pseudo-header:Pseudo-header:

0 31

Source IP address

Destination IP address

00000000 Protocol Segment length

TCPTCP

Reliable end-to-end communication.Reliable end-to-end communication. TCP transport entity:TCP transport entity:

– Runs on machine that supports TCP.Runs on machine that supports TCP.– Interfaces to the IP layer.Interfaces to the IP layer.– Manages TCP streams.Manages TCP streams.

» Accepts user data, breaks it down and sends it as Accepts user data, breaks it down and sends it as separate IP datagrams.separate IP datagrams.

» At receiver, reconstructs original byte stream from At receiver, reconstructs original byte stream from IP datagrams.IP datagrams.

TCP ReliabilityTCP Reliability

Reliable delivery.Reliable delivery.– ACKs.ACKs.– Timeouts and retransmissions.Timeouts and retransmissions.

Ordered delivery.Ordered delivery.

TCP Service Model 1TCP Service Model 1

Obtained by creating TCP end points.Obtained by creating TCP end points.– Example: UNIX sockets.Example: UNIX sockets.– TSAP address: IP address + 16-bit port TSAP address: IP address + 16-bit port

number.number.– Multiple connections can share same port pair.Multiple connections can share same port pair.– Port numbers below 1024: well-known ports Port numbers below 1024: well-known ports

reserved for standard services.reserved for standard services.» List of well-known ports in RFC 1700.List of well-known ports in RFC 1700.

TCP Service Model 2TCP Service Model 2

TCP connections are full-duplex and point-TCP connections are full-duplex and point-to-point.to-point.

Byte stream (not message stream).Byte stream (not message stream).– Message boundaries are not preserved e2e. Message boundaries are not preserved e2e.

A B C D

4 512-byte segments sent asseparate IP datagrams

A B C D

2048 bytes of data deliveredto application in single READ

TCP Byte StreamTCP Byte Stream

When application passes data to TCP, it When application passes data to TCP, it may send it immediately or buffer it.may send it immediately or buffer it.

Sometimes application wants to send data Sometimes application wants to send data immediately.immediately.– Example: interactive applications.Example: interactive applications.– Use PUSH flag to force transmission.Use PUSH flag to force transmission.

URGENT flag.URGENT flag.– Also forces TCP to transmit at once.Also forces TCP to transmit at once.

TCP Protocol Overview 1TCP Protocol Overview 1

TCP’s TPDU: segment.TCP’s TPDU: segment.– 20-byte header + options.20-byte header + options.– Data.Data.– TCP entity decides the size of segment.TCP entity decides the size of segment.

» 2 limits: 64KByte IP payload and MTU.2 limits: 64KByte IP payload and MTU.

» Segments that are too large are fragmented.Segments that are too large are fragmented. More overhead by addition of IP header. More overhead by addition of IP header.

TCP Protocol Overview 2TCP Protocol Overview 2

Sequence numbers.Sequence numbers.– Reliability, ordering, and flow control.Reliability, ordering, and flow control.– Assigned to every byte.Assigned to every byte.– 32-bit sequence numbers.32-bit sequence numbers.

TCP Segment HeaderTCP Segment Header

Source port Destination port

Sequence number

Acknowledgment numberHeaderlength

UA

P R S F Window size

Checksum Urgent pointerOptions (0 or more 32-bit words)

Data

TCP Header Fields 1TCP Header Fields 1

Source and destination ports identify Source and destination ports identify connection end points.connection end points.

Sequence number.Sequence number. Acknowledgment number specifies next byte Acknowledgment number specifies next byte

expected.expected. TCP header length: how many 32-bit words TCP header length: how many 32-bit words

are contained in header.are contained in header. 6-bit unused field.6-bit unused field.

TCP Header Fields 2TCP Header Fields 2

6 1-bit flags:6 1-bit flags:– URG: indicate urgent data present; URG: indicate urgent data present; urgent urgent

pointerpointer gives byte offset from current sequence gives byte offset from current sequence number where urgent data is.number where urgent data is.

– ACK: indicates whether segment contains ACK: indicates whether segment contains acknowledgment; if 0, acknowledgment; if 0, acknowledgement acknowledgement numbernumber field ignored. field ignored.

– PUSH: indicates PUSHed data so receiver PUSH: indicates PUSHed data so receiver delivers it to application immediately.delivers it to application immediately.

TCP Header Fields 3TCP Header Fields 3

Flags (cont’d):Flags (cont’d):– RST: used to reset connection, reject invalid RST: used to reset connection, reject invalid

segment, or refuse to open connection.segment, or refuse to open connection.– SYN: used to establish connection; connection SYN: used to establish connection; connection

request, SYN=1, ACK=0.request, SYN=1, ACK=0.– FIN: used to release connection.FIN: used to release connection.

Window size: how many bytes can be sent Window size: how many bytes can be sent starting at starting at acknowledgment numberacknowledgment number..

TCP Header Fields 4TCP Header Fields 4

Checksum: checksums the Checksum: checksums the header+data+pseudo-header.header+data+pseudo-header.

Options: provide way to add extra Options: provide way to add extra information.information.– Examples: Examples:

» Maximum payload host is willing to accept; can be Maximum payload host is willing to accept; can be advertised during connection setup.advertised during connection setup.

» Window scale factor that allows sender and Window scale factor that allows sender and receiver to negotiate larger window sizes.receiver to negotiate larger window sizes.

TCP Connection SetupTCP Connection Setup

3-way handshake.3-way handshake.

Host 1 Host 2SYN (SEQ=x)

SYN(SEQ=y,ACK=x+1)

(SEQ=x+1, ACK=y+1)

TCP Connection Release 1 TCP Connection Release 1

Abrupt release:Abrupt release:– Send RESET.Send RESET.– May cause data loss.May cause data loss.

TCP Connection Release 2 TCP Connection Release 2

Graceful release:Graceful release:– Each side of the connection released Each side of the connection released

independently.independently.» Either side send TCP segment with FIN=1.Either side send TCP segment with FIN=1.» When FIN acknowledged, that direction is shut down for data.When FIN acknowledged, that direction is shut down for data.» Connection released when both sides shut down. Connection released when both sides shut down.

– 4 segments: 1 FIN and 1 ACK for each direction; 4 segments: 1 FIN and 1 ACK for each direction; 1st. ACK+2nd. FIN combined.1st. ACK+2nd. FIN combined.

TCP Connection Release 3 TCP Connection Release 3

Timers to avoid 2-army problem.Timers to avoid 2-army problem.– If response to FIN not received within 2*MSL, If response to FIN not received within 2*MSL,

FIN sender releases connection.FIN sender releases connection. After connection released, TCP waits for After connection released, TCP waits for

2*MSL (e.g., 120 sec) to ensure all old 2*MSL (e.g., 120 sec) to ensure all old segments have aged.segments have aged.

TCP Transmission 1 TCP Transmission 1

Sender process initiates connection.Sender process initiates connection. Once connection established, TCP can start Once connection established, TCP can start

sending data.sending data. Sender writes bytes to TCP stream.Sender writes bytes to TCP stream. TCP sender breaks byte stream into TCP sender breaks byte stream into

segments.segments.– Each byte assigned sequence number.Each byte assigned sequence number.– Segment sent and timer started. Segment sent and timer started.

TCP Transmission 2TCP Transmission 2

If timer expires, retransmit segment.If timer expires, retransmit segment.– After retransmitting segment for maximum After retransmitting segment for maximum

number of times, assumes connection is dead and number of times, assumes connection is dead and closes it.closes it.

If user aborts connection, sending TCP flushes If user aborts connection, sending TCP flushes its buffers and sends RESET segment.its buffers and sends RESET segment.

Receiving TCP decides when to pass received Receiving TCP decides when to pass received data to upper layer.data to upper layer.

TCP Flow ControlTCP Flow Control

Sliding window.Sliding window.– Receiver’s Receiver’s advertised windowadvertised window..

» Size of advertised window related to receiver’s Size of advertised window related to receiver’s buffer space.buffer space.

» Sender can send data up to receiver’s advertised Sender can send data up to receiver’s advertised window.window.

TCP Flow Control: ExampleTCP Flow Control: Example

2K;SEQ=0

ACK=2048; WIN=2048

2K; SEQ=2048

ACK=4096; WIN=0

ACK=4096; WIN=2048

1K; SEQ=4096

App. writes 2K of data

4K

2K

0

App. reads 2K of data

2K

1K

App. does 3K write

Senderblocked

Sendermay send upto 2K

TCP Flow Control: Observations TCP Flow Control: Observations

TCP sender not required to transmit data as TCP sender not required to transmit data as soon as it comes in form application.soon as it comes in form application.– Example: when first 2KB of data comes in, Example: when first 2KB of data comes in,

could wait for more data since window is 4KB.could wait for more data since window is 4KB. Receiver not required to send ACKs as Receiver not required to send ACKs as

soon as possible.soon as possible.– Wait for data so ACK is piggybacked.Wait for data so ACK is piggybacked.

Delayed ACKsDelayed ACKs Tries to optimize ACK transmission.Tries to optimize ACK transmission. Delay ACKs and window update (500msec) Delay ACKs and window update (500msec)

hoping to piggyback on data segment.hoping to piggyback on data segment. Example: telnet to interactive editor:Example: telnet to interactive editor:

– Send 1 character at a time: 20-byte TCP header+ 1-byte Send 1 character at a time: 20-byte TCP header+ 1-byte data+20-byte IP header.data+20-byte IP header.

– Receiver ACKs immediately: 40-byte ACK.Receiver ACKs immediately: 40-byte ACK.– When editor reads character, window update: 40-byte When editor reads character, window update: 40-byte

datagram.datagram.– Then echoes character back: 41-byte datagram.Then echoes character back: 41-byte datagram.

Nagle’s AlgorithmNagle’s Algorithm

Tries to optimize sending of small data Tries to optimize sending of small data chunks.chunks.

Example: telnet to interactive editor). Example: telnet to interactive editor). – Send first byte and buffer the rest until Send first byte and buffer the rest until

outstanding byte is ACKed; then send all buffered outstanding byte is ACKed; then send all buffered data in one segment; buffer until next ACK. data in one segment; buffer until next ACK.

Disabled in some cases (e.g., window Disabled in some cases (e.g., window application: mouse movements).application: mouse movements).

Silly Window SyndromeSilly Window Syndrome

Caused by receiver sending window updates of very Caused by receiver sending window updates of very small values.small values.– Example: Example:

» Receiver application reads 1 byte at a time and receiver TCP Receiver application reads 1 byte at a time and receiver TCP sends 1-byte window update.sends 1-byte window update.

» Sender TCP has large blocks to send but can only send 1 byte at a Sender TCP has large blocks to send but can only send 1 byte at a time.time.

Solution: [Clark] prevent receiver from generating Solution: [Clark] prevent receiver from generating small window advertisements; also, sender can wait.small window advertisements; also, sender can wait.

Congestion ControlCongestion Control

Why do it at the transport layer?Why do it at the transport layer?– Real fix to congestion is to slow down sender.Real fix to congestion is to slow down sender.

Use law of “conservation of packets”.Use law of “conservation of packets”.– Keep number of packets in the network Keep number of packets in the network

constant.constant.– Don’t inject new packet until old one leaves.Don’t inject new packet until old one leaves.

Congestion indicator: packet loss.Congestion indicator: packet loss.

TCP Congestion Control 1TCP Congestion Control 1

Like, flow control, also window based.Like, flow control, also window based.– Sender keeps Sender keeps congestion window (cwin)congestion window (cwin)..– Each sender keeps 2 windows: receiver’s Each sender keeps 2 windows: receiver’s

advertised window and congestion window.advertised window and congestion window.– Number of bytes that may be sent is Number of bytes that may be sent is

min(advertised window, cwin).min(advertised window, cwin).

TCP Congestion Control 2TCP Congestion Control 2

Slow start [Jacobson 1988]:Slow start [Jacobson 1988]:– Connection’s congestion window starts at 1 Connection’s congestion window starts at 1

segment.segment.– If segment ACKed before time out, If segment ACKed before time out,

cwin=cwin+1.cwin=cwin+1.– As ACKs come in, current cwin is increased by As ACKs come in, current cwin is increased by

1.1.– Exponential increase. Exponential increase.

TCP Congestion Control 3TCP Congestion Control 3

Congestion Avoidance:Congestion Avoidance:– Third parameter: Third parameter: thresholdthreshold..– Initially set to 64KB.Initially set to 64KB.– If timeout, threshold=cwin/2 and cwin=1.If timeout, threshold=cwin/2 and cwin=1.– Re-enters slow-start until cwin=threshold.Re-enters slow-start until cwin=threshold.– Then, cwin grows linearly until it reaches Then, cwin grows linearly until it reaches

receiver’s advertised window.receiver’s advertised window.

TCP Congestion Control: TCP Congestion Control: ExampleExample

TCP Retransmission TimerTCP Retransmission Timer

When segment sent, retransmission timer When segment sent, retransmission timer starts.starts.– If segment ACKed, timer stops.If segment ACKed, timer stops.– If time out, segment retransmitted and timer If time out, segment retransmitted and timer

starts again.starts again.

How to set timer?How to set timer?

Based on round-trip time: time between a Based on round-trip time: time between a segment is sent and ACK comes back.segment is sent and ACK comes back.

If timer is too short, unnecessary If timer is too short, unnecessary retransmissions.retransmissions.

If timer is too long, long retransmission If timer is too long, long retransmission delay.delay.

Jacobson’s Algorithm 1Jacobson’s Algorithm 1

Determining the round-trip time:Determining the round-trip time:– TCP keeps TCP keeps RTTRTT variable. variable. – When segment sent, TCP measures how long it When segment sent, TCP measures how long it

takes to get ACK back (takes to get ACK back (MM).).– RTT = alpha*RTT + (1-alpha)M.RTT = alpha*RTT + (1-alpha)M.– alpha: smoothing factor; determines weight alpha: smoothing factor; determines weight

given to previous estimate.given to previous estimate.– Typically, alpha=7/8.Typically, alpha=7/8.

Jacobson’s Algorithm 2Jacobson’s Algorithm 2

Determining timeout value:Determining timeout value:– Measure RTT variation, or |RTT-M|.Measure RTT variation, or |RTT-M|.– Keeps smoothed value of cumulative variation Keeps smoothed value of cumulative variation

D=alpha*D+(1-alpha)|RTT-M|.D=alpha*D+(1-alpha)|RTT-M|.– Alpha may or may not be the same as value Alpha may or may not be the same as value

used to smooth RTT.used to smooth RTT.– Timeout = RTT+4*D. Timeout = RTT+4*D.

Karn’s AlgorithmKarn’s Algorithm

How to compute ACKs for retransmitted How to compute ACKs for retransmitted segments? segments? – Count it for first or second transmission?Count it for first or second transmission?– Karn proposed not to update RTT on any Karn proposed not to update RTT on any

retransmitted segment.retransmitted segment.– Instead RTT is doubled on each failure until Instead RTT is doubled on each failure until

segments get through.segments get through.

Persistence TimerPersistence Timer

Prevents deadlock if an window update Prevents deadlock if an window update packet is lost and advertised window = 0.packet is lost and advertised window = 0.

When persistence timer goes off, sender When persistence timer goes off, sender probes receiver; receiver replies with its probes receiver; receiver replies with its current advertised window.current advertised window.

If 0, persistence timer is set again. If 0, persistence timer is set again.

Keepalive TimerKeepalive Timer

Goes off when a connection is idle for a Goes off when a connection is idle for a long time.long time.

Causes one side to check whether the other Causes one side to check whether the other side is still alive.side is still alive.

If no answer, connection terminated. If no answer, connection terminated.

TIME_WAITTIME_WAIT

2*MSL.2*MSL. Makes sure all segments die after Makes sure all segments die after

connection is closed.connection is closed.

Wireless TCP 1Wireless TCP 1

According to layered system design According to layered system design principles, transport protocol should be principles, transport protocol should be independent of underlying technology.independent of underlying technology.

However, wireless networks invalidate this However, wireless networks invalidate this principle.principle.– Ignoring properties of wireless medium can Ignoring properties of wireless medium can

lead to poor TCP performance.lead to poor TCP performance.– Problem: TCP’s congestion control.Problem: TCP’s congestion control.

Wireless TCP 2Wireless TCP 2

Problem: packet loss as congestion Problem: packet loss as congestion indicator.indicator.– When retransmission timer times out, sender When retransmission timer times out, sender

slows down.slows down. Wireless links are lossy!Wireless links are lossy!

– Dealing with losses in this case should be re-Dealing with losses in this case should be re-sending lost segments asap.sending lost segments asap.

Indirect TCP (I-TCP)Indirect TCP (I-TCP)

[Bakne and Badrinath, 1995].[Bakne and Badrinath, 1995]. Split TCP connection in 2: one from sender to base Split TCP connection in 2: one from sender to base

station and the other from base station to receiver.station and the other from base station to receiver.– Base station serves as “repeater”: copies segments Base station serves as “repeater”: copies segments

between connections in both directions.between connections in both directions.– Connections are homogeneous; timeouts on 1st. Connections are homogeneous; timeouts on 1st.

connection, slow down sender.connection, slow down sender.– Problem: violates TCP’s e2e’ness.Problem: violates TCP’s e2e’ness.

Example: ACKs to sender mean base station received segments, not Example: ACKs to sender mean base station received segments, not necessarily receiver. necessarily receiver.

Snoop TCPSnoop TCP

[Balakrishnan et al., 1995].[Balakrishnan et al., 1995]. Does not break connection. Does not break connection. Modifications to base station’s network layer code.Modifications to base station’s network layer code.

– Snooping agent on base station observes and caches TCP Snooping agent on base station observes and caches TCP segments sent to mobile host and ACKs coming back.segments sent to mobile host and ACKs coming back.

– If it doesn’t see an ACK for a segment or sees duplicate If it doesn’t see an ACK for a segment or sees duplicate ACKs, it times out and retransmits.ACKs, it times out and retransmits.

– But source may time out anyway.But source may time out anyway.

End-To-End ArgumentEnd-To-End Argument

Design principle to help guide placement of Design principle to help guide placement of functionality in distributed systems.functionality in distributed systems.

Rationale for moving functions upward Rationale for moving functions upward closer to application.closer to application.

Where to place distributed Where to place distributed systems functions?systems functions?

Layered system design:Layered system design:– Different levels of abstraction for simplicity.Different levels of abstraction for simplicity.– Lower layer provides service to upper layer.Lower layer provides service to upper layer.– Very well defined interfaces.Very well defined interfaces.

Some functions can be implemented at Some functions can be implemented at different layers or even at multiple layers.different layers or even at multiple layers.

E2E Argument StatementE2E Argument Statement

““The function in question can completely and The function in question can completely and correctly be implemented only with the correctly be implemented only with the knowledge and help of the application at knowledge and help of the application at the endpoints. Therefore providing that the endpoints. Therefore providing that function in the communication system itself function in the communication system itself is not possible. Sometimes an incomplete is not possible. Sometimes an incomplete version of the function provided by the version of the function provided by the communication system may be useful as communication system may be useful as performance enhancementperformance enhancement.”.”

Functions Closer to ApplicationFunctions Closer to Application

E2E argument paper argues that functions should be E2E argument paper argues that functions should be moved closer to the application that uses them.moved closer to the application that uses them.

Rationale:Rationale:– Some functions can only be completely and correctly Some functions can only be completely and correctly

implemented with app’s knowledge.implemented with app’s knowledge.» Example: file transfer.Example: file transfer.

» If error occurs in the network, network reliability can fix it.If error occurs in the network, network reliability can fix it.

» Otherwise, only application can.Otherwise, only application can.

Another perspective: CostAnother perspective: Cost

Why pay for something you don’t need.Why pay for something you don’t need.» Example 1: the Internet.Example 1: the Internet.

» Example 2: trend in kernel design - take away from Example 2: trend in kernel design - take away from kernel as much functionality as possible.kernel as much functionality as possible.

Applications that don’t need certain Applications that don’t need certain functions should not have to pay for them. functions should not have to pay for them.

E2E Counter ArgumentE2E Counter Argument

Performance!Performance!– Example: File transferExample: File transfer

» Reliability checks at lower layers detect problems Reliability checks at lower layers detect problems earlier.earlier.

» Abort transfer and re-try without having to wait till Abort transfer and re-try without having to wait till whole file is transmitted.whole file is transmitted.

““Spread out” functionality across layers.Spread out” functionality across layers.

Domain Name System (DNS)Domain Name System (DNS)

Basic function: translation of names (ASCII Basic function: translation of names (ASCII strings) to network (IP) addresses and vice-strings) to network (IP) addresses and vice-versa.versa.

Example: Example: – zephyr.isi.edu <-> 128.9.160.160zephyr.isi.edu <-> 128.9.160.160

HistoryHistory

Original approach (ARPANET, 1970’s):Original approach (ARPANET, 1970’s):– File File hosts.txt hosts.txt listed all hosts and their IP addresses.listed all hosts and their IP addresses.– Every night every host fetches file from central Every night every host fetches file from central

repository.repository.– OK for a few hundred hosts.OK for a few hundred hosts.– Scalability?Scalability?

» File size.File size.

» Centrally managed.Centrally managed.

DNSDNS

Hierarchical name space.Hierarchical name space. Distributed database.Distributed database. RFCs 1034 and 1035.RFCs 1034 and 1035.

How is it used?How is it used?

Client-server model.Client-server model.– Client DNS (running on client hosts), or Client DNS (running on client hosts), or

resolver.resolver.– Application calls resolver with name.Application calls resolver with name.– Resolver contacts local DNS server (using Resolver contacts local DNS server (using

UDP) passing the name.UDP) passing the name.– Server returns corresponding IP address.Server returns corresponding IP address.

DNS Name SpaceDNS Name Space

Tree-based hierarchy.Tree-based hierarchy.

int com edu gov mil org net us ca …

usc

cs ee

ibm

eng sales

Name Space StructureName Space Structure

Top-level domains:Top-level domains:– Generic.Generic.– Countries.Countries.

Leaf domains: no sub-domains.Leaf domains: no sub-domains. In practice all US organizations are under a In practice all US organizations are under a

generic domain, while everything outside generic domain, while everything outside the US is under the corresponding country the US is under the corresponding country domain.domain.

DNS NamesDNS Names

Domain names:Domain names:– Concatenation of all domain names starting from Concatenation of all domain names starting from

its own all the way to the root separated by “.”.its own all the way to the root separated by “.”.– Refers to a tree node and all names under it.Refers to a tree node and all names under it.– Case insensitive.Case insensitive.– Components up to 63 characters.Components up to 63 characters.– Full name less than 255 characters.Full name less than 255 characters.

Name Space ManagementName Space Management

Domains are autonomous.Domains are autonomous.– Organizational boundaries.Organizational boundaries.– Each domain manages its own name space Each domain manages its own name space

independently of other domains.independently of other domains. Delegation:Delegation:

– When creating new domain: register with parent When creating new domain: register with parent domain.domain.

» For name uniqueness.For name uniqueness.

» For name resolution.For name resolution.

Resource RecordsResource Records

Entry in the DNS database.Entry in the DNS database. Several types of entries or RRs.Several types of entries or RRs. Example: RR “A” contains IP address.Example: RR “A” contains IP address. Name <-> several resource records.Name <-> several resource records. RR format: five-tuple.RR format: five-tuple.

– Name.Name.– TTL (in seconds).TTL (in seconds).– Class (usually “IN” for Internet info).Class (usually “IN” for Internet info).– Type: type of RR.Type: type of RR.– Value.Value.

RR Types 1RR Types 1

SOA: start of authority.SOA: start of authority.– Marks beginning of zone’s database.Marks beginning of zone’s database.– Provides general info about the zone: e-mail Provides general info about the zone: e-mail

address of admin, default TTL, etc.address of admin, default TTL, etc. A: address.A: address.

– Contains 32-bit IP address.Contains 32-bit IP address.– Single name <-> several A RRs.Single name <-> several A RRs.

MX: mail exchange.MX: mail exchange.– Name of mail server for this domain.Name of mail server for this domain.

RR Types 2RR Types 2

NS: name server.NS: name server.– Name of name server for this domain.Name of name server for this domain.

CNAME: canonical name.CNAME: canonical name.– Alias.Alias.

HINFO: host description.HINFO: host description.– Provides information about host, e.g., CPU type, OS, Provides information about host, e.g., CPU type, OS,

etc.etc. TXT: arbitrary string of characters.TXT: arbitrary string of characters.

– Generic description of the domain, where it is located, Generic description of the domain, where it is located, etc.etc.

Name ServersName Servers

Entire database in a single name server.Entire database in a single name server.– Practical?Practical?– Why?Why?

DNS database is partitioned into DNS database is partitioned into zoneszones.. Each zone contains part of the DNS tree.Each zone contains part of the DNS tree. Zone <-> name server.Zone <-> name server.

– Each zone may be served by more than 1 server.Each zone may be served by more than 1 server.– A server may serve multiple zones.A server may serve multiple zones.

Primary and secondary name servers.Primary and secondary name servers.

Name Resolution 1Name Resolution 1

Application wants to resolve name.Application wants to resolve name. Resolver sends query to local name server.Resolver sends query to local name server.

– Resolver configured with list of local name servers.Resolver configured with list of local name servers.

– Select servers in round-robin fashion.Select servers in round-robin fashion.

If name is local, local name server returns matching If name is local, local name server returns matching authoritativeauthoritative RRs. RRs.– AuthoritativeAuthoritative RR comes from authority managing the RR RR comes from authority managing the RR

and is always correct.and is always correct.

– CachedCached RRs may be out of date. RRs may be out of date.

Name Resolution 2Name Resolution 2

If information not available locally (not If information not available locally (not even cached), local NS will have to ask even cached), local NS will have to ask someone else.someone else.– It asks the server of the top-level domain of the It asks the server of the top-level domain of the

name requested.name requested.

Recursive ResolutionRecursive Resolution

Recursive query:Recursive query:– Each server that doesn’t have info forwards it to Each server that doesn’t have info forwards it to

someone else.someone else.– Response finds its way back.Response finds its way back.

Alternative:Alternative:– Name server not able to resolve query, sends back Name server not able to resolve query, sends back

the name of the next server to try.the name of the next server to try.– Some servers use this method.Some servers use this method.– More control for clients.More control for clients.

ExampleExample

Suppose resolver on flits.cs.vu.nl wants to resolve Suppose resolver on flits.cs.vu.nl wants to resolve linda.cs.yale.edu.linda.cs.yale.edu.– Local NS, cs.vu.nl, gets queried but cannot resolve it.Local NS, cs.vu.nl, gets queried but cannot resolve it.– It then contacts .edu server.It then contacts .edu server.– .edu server forwards query to yale.edu server..edu server forwards query to yale.edu server.– yale.edu contacts cs.yale.edu, which has the authoritative yale.edu contacts cs.yale.edu, which has the authoritative

RR.RR.– Response finds its way back to originator.Response finds its way back to originator.– cs.vu.nl caches this info.cs.vu.nl caches this info.

» Not authoritative (since may be out-of-date).Not authoritative (since may be out-of-date).» RR TTL determines how long RR should be cached.RR TTL determines how long RR should be cached.