TCP/UDP/IP
Courtesy of Kevin Fall at UC Berkeley& Raghupathy Sivakumar at
GATECH
TCP/IP Protocol SuitePhysical layerData-link layer – ARP, RARP, Network layer – IP, ICMP, IGMPTransport layer – TCP, UDP, RTPApplication layer – http, smtp, ftp
ApplicationTransport
IPDataLinkPhysical
TCP/IP Protocol SuiteIP is used for each network node (or router)
ApplicationTransport
IPDataLinkPhysical
ApplicationTransport
IPDataLinkPhysical
IPDataLinkPhysical
IPDataLinkPhysical
Source Dest
Router Router
Internet Protocol (IP) service model
best-effort datagram model error detection in header only addressing, routing signaling (ICMP) Fragmentation and reassembly Multiplexing and Demultiplexing
AddressingNeed a unique identifier for every host in the Internet (analogous to postal address)IP addresses are 32 bits longHierarchical addressing schemeConceptually … IPaddress
=(NetworkAddress,HostAddress)
Address ClassesClass A
Class B
Class C
0 netId hostId7 bits 24 bits
1 0 netId hostId14 bits 16 bits
1 1 0 netId hostId21 bits 8 bits
Addresses and HostsSince netId is encoded into IP address, each host will have a unique IP address for each of its network connectionsHence, IP addresses refer to network connections and not hostsWhy will hosts have multiple network connections?
Special AddressesPrefix(netID)
Suffix(hostID)
Type of Address
Purpose
All 0s All 0s This computer Used during bootstrap
Network All 0s Network Identifies a network
Network All 1s Direct broadcast Broadcast on a specified net
All 1s All 1s Limited broadcast
Broadcast on a local net
127 any loop back testing
Exceptions to AddressingSubnetting Splitting hostId into subnetId and hostId Achieved using subnet masks
Supernetting (Classless Inter-domain Routing or CIDR) Combining multiple lower class address
ranges into one range Achieved using 32 bit masks and max prefix
routing
ExamplesSubnetting (B class)
Network Host
Network HostSubnet
IP RoutingDirect If source and destination hosts are
connected directly Still need to perform IP address to physical
address translationIndirect Table driven routing Each entry: (NetId, RouterId)
Default router Host-specific routes
IP FragmentationThe physical network layers of different networks in the Internet might have different maximum transmission units (MTUs)The IP layer performs fragmentation when the next network has a smaller MTU than the current network
MTU = 1500 MTU=500
IP fragmentation
IP ReassemblyFragmented packets need to be put together
Where does reassembly occur? The router at the other end of the smaller
MTU network Router overhead: complexity, buffering More than one path
The final destination Many fragments on the path
more chance of missing packets Utilization inefficiency (many headers)
IP HeaderUsed for conveying information to peer IP layers
ApplicationTransport
IPDataLinkPhysical
ApplicationTransport
IPDataLinkPhysical
IPDataLinkPhysical
IPDataLinkPhysical
Source Dest
Router Router
IP Header (contd.)16 bit total length4 bit
version4 bit hdrlength
16 bit identification
8 bit TTL 8 bit protocol 16 bit header checksum
3 bitflags
32 bit source IP address
32 bit destination IP address
13 bit fragment offset
Options (if any) (maximum 40 bytes)
data
8 bitTOS
Multiplexing
Web Email MP3
TCP UDP
IP
Web Email MP3
TCP UDP
IP
IP datagrams IP datagrams
Endpoint identification how to identify a remote application/service on the Internet? [IP_address, port number, protocol] expect to find a process listening for incoming packets
Port numbers port numbers are in range [0..64K-1] ports below 1024 are known as well-known ports and reserved by IANA ports in range [1024..64K-1] may be registered but are not enforced
User datagram protocol (UDP)
UDPprovides a datagram service model Additional intelligence built at the
application layer if needed Error detection header (8bytes)
Sending a UDP datagram application requires that dest IP address, port number to send application chooses message size, requests send using API (e.g. sockets) API allocates OS-level buffer, leaving for some headers, copies data from user-level buffer to OS-level buffer, gives to UDP module
Sending a UDP datagram UDP module receives data and prepends IP and UDP headers fills in IP header info proto, len, src, dst,… fills in UDP header src_port, dst_port, len,… sets TTL and TOS sends UDP/IP packet to IP module
UDPheaderIP headerEthernet
header Application dataEthernet trailer
Sending a UDP datagram IP module receives packet insert options if enabled sets IP vers, IHL, offset, ID fields determines an interface/MTU fragments if needed and sends to link layer
Receiving a UDP datagram network adapter receives a frame, interrupts processor device driver determines frame contains IP type data, strips link layer header and gives to IP module IP checks IP header, processes options IP checks IP address (unicast, multicast, …) IP reassembles if necessary, give the whole packet to UDP based on protocol field
Receiving a UDP datagram UDP receives IP/UDP packet checks length and checksum locates OS PCB based on dest port, providing receiving process’ ID; generates ICMP unreachable if nobody there copies to receiving process’ buffer makes receiving process get to this
*PCB: protocol control block
Why use UDP? downsides no error correction No flow control No congestion control App picks packet size
upsides No connection establishment
stateless Broadcast/multicast more straight forward App picks packet size
Transmission Control Protocol (TCP)
TCPEnd-to-end transport protocolResponsible for reliability, congestion control, flow control, and sequenced deliveryApplications that use TCP: http (web), telnet, ftp (file transfer), smtp (email), chatApplications that don’t: multimedia (typically) – use UDP instead
Ports, End-points, & Connections
Thus, an end-point is represented by (IP address,Port)Ports can be re-used between transport protocolsA connection is (SRC IP address, SRC port, DST IP address, DST port)Same end-point can be used in multiple connections
IP LayerTCP UDP
http ftp smtptelnet
IP address
Protocol ID
A1 A2 A3
Transport
Port
TCPConnection EstablishmentConnection Maintenance Reliability
by acknowledgement packet (ACK) Congestion control Flow control SequencingConnection Termination
Fundamental MechanismSimple stop and go protocolTimeout based reliability (loss recovery)Multiple unacknowledged packets (W)
data
retxack
data
ack
data
Sliding Window Protocol: 1 2 3 4 5 6 7 8 9 10 11 12 ….
RTO
Sliding window
Sliding window for flow control
Sliding windowThe sender cannot send more data
Active and Passive OpenHow do applications initiate a connection?One end (server) registers with the TCP layer instructing it to “accept” connections at a certain portThe other end (client) initiates a “connect” request which is “accept”-ed by the server
Reliability (Loss Recovery)Sequence NumbersTCP uses cumulative Acknowledgments (ACKs)
Next expected in-sequence packet sequence number
Pros and cons? Piggybacking
Timeout calculation Rttavg = k*Rttavg + (1-k)*Rttsample RTO = Rttavg + 4*Rttdeviation
ack
data
3
1234
3
34
Retransmission (fast retransmit)
after 3 duplicate ACKs, TCP sender figures out the packet is lost
Congestion control: slow startInitial window size W = 1Each ACK will increase W by 1
(can be bottleneck!)
Congestion ControlSlow Start Start with W=1 For every ACK, W=W+1
Congestion Avoidance (linear increase) For every ACK, W = W+1/W
Congestion Control (multiplicative decrease) ssthresh = W/2 W = 1
Alternative: Fall to W/2 and startcongestion avoidance directly
Why LIMD? (fairness)• W=1
• 100 10 diff = 90• 1 1 diff = 0• Problem? – inefficient
• W=W/2• 100 10 diff = 90• 50 5 diff = 45• 51 6 diff = 45• 52 7 diff = 45• ..• 73 28 diff = 45• 37.5 14 diff = 23.5• ..• 61.75 38.25 diff = 23.5• 30.85 19.65 diff = 11.2• ..
Flow Control
Prevent sender from overwhelming the receiverReceiver in every ACK advertises the available buffer space at its endWindow calculation MIN(congestion control window, flow control window)
SequencingByte sequence numbersTCP receiver buffers out of order segments and reassembles them laterStarting sequence number randomly chosen during connection establishment Why?
3
1234
3
34
1 given to app2 given to appLoss4 buffered (not given to app)
3 & 4 given to app4 discarded
Connection Establishment & Termination
3-way handshake used for connection establishment Delay!
Randomly chosen sequence number (why?) is conveyed to the other endSimilar FIN, FIN+ACK exchange used for connection termination
SYNSYN+ACK
ACKDATA
Server does passive open
Accept connection requestSend acceptance
Start connection
Active openSend connectionrequest
TCP Segment Format
HL
16 bit SRC Port 16 bit DST Port32 bit sequence number
32 bit ACK number16 bit window sizeRsv’d flags16 bit urgent pointer16 bit TCP checksum
Options (if any)Data
Flags: URG, ACK, PSH, RST, SYN,FIN
Silly window syndrome (SWS)
TCP is a window-based protocolTCP receiver advertises a small amount of window; so TCP sender transmits only a short packet each timeInefficient utilization of network BWSo what? Save up enough to send
Nagle’s algorithmBuffer all user data if any unacknowledged data is outstandingOk to send if all is ACK’d or have a MSS size worth of dataIf small delay is wanted, Nagle’s algorithm should be disabledMSS size: maximum TCP payload size
MTU: maximum PDU size supported by link layerMTU = MSS + 20 (TCP header) + 20 (IP header)
Interactive applications: Telnet
• Remote terminal applications (e.g., Telnet) send characters to a server. The server interprets the character and sends the output at the server to the client.
• For each character typed, you see three packets:1. Client Server: Send typed character 2. Server Client: Echo of character (or user output) and
acknowledgement for first packet3. Client Server: Acknowledgement for second packet
1.send character
2.interpretcharacter
3.send echo of character
and/or output
Host withTelnet client
Host withTelnet server
Why 3 packets per character?We would expect four packets per character:
However, tcpdump shows this pattern:
What has happened? TCP has delayed the transmission of an ACK
character
ACK of character
ACK of echoed character
echo of character
character
ACK and echo of character
ACK of echoed character
Delayed ACKSProblem: In request/response programs, you send
separate ACK and Data packets for each transaction
Solution: Don’t ACK data immediately Wait 200ms (must be less than 500ms –
why?) Must ACK every other packet Must not delay duplicate ACKs
UDP-liteError-resilient CODECs appearOver wireless links, BER is not negligibleChecksumming drops corrupted packets (even 1 bit error)
UDP / UDP Lite
Socket Interface
H.263+ Encoder
RTP
IP
PPP
Packetization
RTP
UDP / UDP Lite
IP
PPP
De-packetization
H.263+ Decoder
Socket Interface
e.g. cellular networks
Fixed HostUnix BSDi 3.0
GSMBase Station
GSM Network PSTN
Mobile HostUnix BSDi 3.0
UDP-liteError-resilient CODECs means there are redundancy or FECIt may be better to use packets with some errorsIn UDP-lite, there are error-sensitive and insensitive parts
The size of the former part is called coverage E.g. application header can be error-sensitive part
Implemented in BSDi 3.0 kernel Requires MAC-lite as well
Receiver MAC should pass the data to upper layer despite errors
source port # dest port #
length / coverage checksum
0 7 8 15