Upload
jayden-tyson
View
220
Download
0
Tags:
Embed Size (px)
Citation preview
1
Transport Protocols
Relates to Lab 5. UDP and TCP
2
Midterm
3
Roadmap
• UDP– Unreliable, connectionless datagram service
• TCP– Reliable, in order, connection-oriented, byte stream service
• Principles– Multiplexing/demultiplexing– How to build reliable service on top of unreliable service
4
Orientation
• We move one layer up and look at the transport layer.
ApplicationLayer
NetworkLayer
Link Layer
IP
ARPHardwareInterface
RARP
Media
ICMP IGMP
TransportLayer
TCP UDP
UserProcess
UserProcess
UserProcess
UserProcess
5
Orientation
• Transport layer protocols are end-to-end protocols • They are only implemented at the hosts
Application
Transport
Network
HOST
Data Link Data Link Data Link
Network
Application
Transport
Network
HOST
Data Link
6
Transport Protocols in the Internet
UDP - User Datagram ProtocolUDP - User Datagram Protocol• datagram oriented• unreliable, connectionless• simple• unicast and multicast• useful only for few applications,
e.g., multimedia applications• used a lot for services
– network management (SNMP), routing (RIP), naming (DNS), etc.
TCP - Transmission Control Protocol
• byte stream oriented• reliable, connection-oriented• complex• only unicast• used for most Internet
applications:– web (http), email (smtp), file
transfer (ftp), terminal (telnet), etc.
• The most commonly used transport protocols are UDP and TCP.
7
UDP - User Datagram Protocol
• UDP supports unreliable transmissions of datagrams– Each output operation by a process produces exactly one
UDP datagram• The only thing that UDP adds is multiplexing and
demultiplexing• Protocol number: 17
UDP
IP IPIP IP IP
UDP
Applications Applications
8
UDP Format
IP header UDP header UDP data
UDP message length Checksum
DATA
20 bytes 8 bytes
0 15 16 31
Source Port Number Destination Port Number
• Port numbers identify sending and receiving applications (processes). Maximum port number is 216-1= 65,535
•Message Length is at least 8 bytes (I.e., Data field can be empty) and at most 65,535
•Checksum includes UDP header and data.
9
Port Numbers
• UDP (and TCP) use port numbers to identify applications• A globally unique address at the transport layer (for both UDP
and TCP) is a tuple <IP address, port number>• There are 65,535 UDP ports per host.
IP
TCP UDP
UserProcess
Demultiplex
based on
Protocol field in IP
header
UserProcess
UserProcess
UserProcess
UserProcess
UserProcess
Demultiplex
based on
port number
10
Transport Control Protocol (TCP)
11
Overview
TCP = Transmission Control Protocol• Connection-oriented protocol• Provides a reliable unicast end-to-end byte stream over an
unreliable internetwork.
TCP
IP Internetwork
Byt
e S
tream
Byt
e S
tream
TCP
12
Connection-Oriented
• Before any data transfer, TCP establishes a connection:
• Analogy: making a phone call
• One TCP entity is waiting for a connection (“server”)
• The other TCP entity (“client”) contacts the server
• Each connection is full duplex
CLIENT SERVER
waiting forconnection
request
Request a connection
Accept a connection
DisconnectData Transer
13
Reliable
• Byte stream is broken up into chunks which are called seg-ments
• Receiver sends acknowledgements (ACKs) for segments
• TCP maintains a timer. If an ACK is not received in time, the segment is retransmitted
•Detecting errors and packet losses:
• TCP has checksums for header and data. Segments with invalid checksums are discarded
• Each byte that is transmitted has a sequence number
14
Byte Stream Service
• To the lower layers, TCP handles data in blocks, the segments.
• To the higher layers TCP handles data as a sequence of bytes and does not identify boundaries between bytes
• So: Higher layers do not know about the beginning and end of segments !
TCP
Application
1. write 100 bytes2. write 20 bytes
queue ofbytes to betransmitted TCP
queue ofbytes thathave beenreceived
Application1. read 40 bytes2. read 40 bytes3. read 40 bytes
Segments
15
TCP Format
IP header TCP header TCP data
Sequence number (32 bits)
DATA
20 bytes 20 bytes
0 15 16 31
Source Port Number Destination Port Number
Acknowledgement number (32 bits)
window sizeheaderlength
0 Flags
Options (if any)
TCP checksum urgent pointer
20 bytes
• TCP segments have a 20 byte header with >= 0 bytes of data.
16
TCP header fields
• Port Number:• A port number identifies the endpoint of a connection.• A pair <IP address, port number> identifies one
endpoint of a connection. • Two pairs <client IP address, server port number>
and <server IP address, server port number> identify a TCP connection.
TCP
IP
Applications
23 10480Ports:
TCP
IP
Applications
7 1680 Ports:
17
TCP header fields
• Sequence Number (SeqNo):– Sequence number is 32 bits long. – So the range of SeqNo is
0 <= SeqNo <= 232 -1 4.3 Gbyte
– The sequence number in a segment identifies the first byte in the segment
– Initial Sequence Number (ISN) of a connection is set during connection establishment
18
TCP header fields
• Acknowledgement Number (AckNo):– Acknowledgements are piggybacked, I.e
a segment from A -> B can contain an acknowledgement for a data sent in the B -> A direction
– A hosts uses the AckNo field to send acknowledgements. (If a host sends an AckNo in a segment it sets the “ACK flag”)
– The AckNo contains the next SeqNo that a host is expectingExample: The acknowledgement for a segment with
sequence numbers 0-1460 is AckNo=1461
– ACK is cumulative
19
TCP header fields
• Header Length ( 4bits):– Length of header in 32-bit words– Note that TCP header has variable length (with minimum
20 bytes)
20
TCP header fields
• Flag bits:– URG: Urgent pointer is valid
– If the bit is set, the following bytes contain an urgent message in the range:
SeqNo <= urgent message <= SeqNo+urgent pointer
– ACK: Acknowledgement Number is valid– PSH: PUSH Flag
– Notification from sender to the receiver that the receiver should pass all data that it has to the application.
– Normally set by sender when the sender’s buffer is empty
21
TCP header fields
• Flag bits:– RST: Reset the connection
– The flag causes the receiver to reset the connection– Receiver of a RST terminates the connection and indicates
higher layer application about the reset
– SYN: Synchronize sequence numbers– Sent in the first packet when initiating a connection
– FIN: Sender is finished with sending– Used for closing a connection– Both sides of a connection must send a FIN
22
TCP header fields
• Window Size:– Each side of the connection advertises the window size – Window size is the maximum number of bytes that a
receiver can accept.– Maximum window size is 216-1= 65535 bytes
• TCP Checksum:– TCP checksum covers over both TCP header and TCP
data (also covers some parts of the IP header)• Urgent Pointer:
– Only valid if URG flag is set
23
TCP header fields
• Options:
End ofOptions kind=0
1 byte
NOP(no operation) kind=1
1 byte
MaximumSegment Size kind=2
1 byte
len=4
1 byte
maximumsegment size
2 bytes
Window ScaleFactor kind=3
1 byte
len=3
1 byte
shift count
1 byte
Timestamp kind=8
1 byte
len=10
1 byte
timestamp value
4 bytes
timestamp echo reply
4 bytes
24
TCP header fields
• Options: – NOP is used to pad TCP header to multiples of 4 bytes– Maximum Segment Size– Window Scale Options
» Increases the TCP window from 16 to 32 bits, I.e., the window size is interpreted differently
» This option can only be used in the SYN segment (first segment) during connection establishment time
– Timestamp Option» Can be used for roundtrip measurements
25
Connection Management in TCP
• Opening a TCP Connection• Closing a TCP Connection• State Diagram
26
TCP Connection Establishment
• TCP uses a three-way handshake to open a connection:
aida.poly.edu mng.poly.edu
SYN (SeqNo = x)
SYN (SeqNo = y, AckNo = x + 1 )
(SeqNo = x+1, AckNo = y + 1 )
27
A Closer Look with tcpdump
1 aida.poly.edu.1121 > mng.poly.edu.telnet: S 1031880193:1031880193(0) win 16384 <mss 1460,nop,wscale 0,nop,nop,timestamp>
2 mng.poly.edu.telnet > aida.poly.edu.1121: S 172488586:172488586(0) ack 1031880194 win 8760 <mss 1460>
3 aida.poly.edu.1121 > mng.poly.edu.telnet: . ack 172488587 win 17520
4 aida.poly.edu.1121 > mng.poly.edu.telnet: P 1031880194:1031880218(24) ack 172488587 win 17520
5 mng.poly.edu.telnet > aida.poly.edu.1121: P 172488587:172488590(3) ack 1031880218 win 8736
6 aida.poly.edu.1121 > mng.poly.edu.telnet: P 1031880218:1031880221(3) ack 172488590 win 17520
aida.poly.edu mng.poly.edu
aida issuesan "telnet mng"
28
Three-Way Handshake
aida.poly.edu mng.poly.edu
S 1031880193:1031880193(0)win 16384 <mss 1460, ...>
S 172488586:172488586(0)
ack 1031880194 win 8760 <mss 1460>
ack 172488587 win 17520
29
TCP Connection Termination
• Each end of the data flow must be shut down independently (“half-close”)
• If one end is done it sends a FIN segment. The other end sends ACK.
• Four messages to complete shut down a connection
FIN
ACK
ACK
FIN
A B
B can still send to A
30
Connection termination with tcpdump
1 mng.poly.edu.telnet > aida.poly.edu.1121: F 172488734:172488734(0) ack 1031880221 win 8733
2 aida.poly.edu.1121 > mng.poly.edu.telnet: . ack 172488735 win 17484
3 aida.poly.edu.1121 > mng.poly.edu.telnet: F 1031880221:1031880221(0) ack 172488735 win 17520
4 mng.poly.edu.telnet > aida.poly.edu.1121: . ack 1031880222 win 8733
aida.poly.edu mng.poly.edu
aida issuesan "telnet mng"
31
TCP Connection Termination
aida.poly.edu mng.poly.edu
F 172488734:172488734(0)
ack 1031880221 win 8733
. ack 172488735 win 17484
. ack 1031880222 win 8733
F 1031880221:1031880221(0)ack 172488735 win 17520
32
TCP state diagram
33
TCP States in “Normal” Connection Lifetime
SYN (SeqNo = x)
SYN (SeqNo = y, AckNo = x + 1 )
(AckNo = y + 1 )
SYN_SENT(active open)
SYN_RCVD
ESTABLISHED
ESTABLISHED
FIN_WAIT_1(active close)
LISTEN(passive open)
FIN (SeqNo = m)
CLOSE_WAIT(passive close)
(AckNo = m+ 1 )
FIN (SeqNo = n )
(AckNo = n+1)LAST_ACK
FIN_WAIT_2
TIME_WAIT
CLOSED
34
2MSL Wait State
2MSL Wait State = TIME_WAIT• When TCP does an active close, and sends the final ACK, the connection
must stay in in the TIME_WAIT state for twice the maximum segment lifetime.
2MSL= 2 * Maximum Segment Lifetime
• Why? • TCP is given a chance to resent
the final ACK. (Server will timeout after sending the FIN segment and resend the FIN)
• The MSL is set to 2 minutes or 1 minute or 30 seconds.
FIN
ACK
ACK
FIN
A B
X
35
Resetting Connections
• Resetting connections is done by setting the RST flag • When is the RST flag set?
– Connection request arrives and no server process is waiting on the destination port
– Abort (Terminate) a connection Causes the receiver to throw away buffered data. Receiver does not acknowledge the RST segment
36
TCP: Delayed ACKs and Nagle’s algorithm
37
Interactive and bulk data transfer
TCP applications can be put into the following categoriesbulk data transfer - ftp, mail, httpinteractive data transfer - telnet, rlogin
TCP has heuristics to deal these application types.
For interactive data transfer:• Try to reduce the number of packets
For bulk data transfer:• High throughput
38
Telnet session on a local network
Argon.cs.virginia.edu Neon.cs.virginia.edu
Telnet sessionfrom Argonto Neon
• This is the output of typing 3 (three) characters :
Time 44.062449: Argon Neon: Push, SeqNo 0:1(1), AckNo 1 Time 44.063317: Neon Argon: Push, SeqNo 1:2(1), AckNo 1Time 44.182705: Argon Neon: No Data, AckNo 2
Time 48.946471: Argon Neon: Push, SeqNo 1:2(1), AckNo 2 Time 48.947326: Neon Argon: Push, SeqNo 2:3(1), AckNo 2 Time 48.982786: Argon Neon: No Data, AckNo 3
Time 55.116581: Argon Neon: Push, SeqNo 2:3(1) AckNo 3Time 55.117497: Neon Argon: Push, SeqNo 3:4(1) AckNo 3 Time 55.183694: Argon Neon: No Data, AckNo 4
39
Interactive applications: Telnet
• Remote terminal applications (e.g., Telnet) send characters to a server. The server interprets the character and sends the output at the server to the client.
• For each character typed, you see three packets:1. Client Server: Send typed character 2. Server Client: Echo of character (or user output) and
acknowledgement for first packet3. Client Server: Acknowledgement for second packet
1.send character
2.interpretcharacter
3.send echo of character
and/or output
Host withTelnet client
Host withTelnet server
40
Why 3 packets per character?
• We would expect four packets per character:
• However, tcpdump shows this pattern:
• What has happened? TCP has delayed the transmission of an ACK
character
ACK of character
ACK of echoed character
echo of character
character
ACK and echo of character
ACK of echoed character
41
Delayed Acknowledgement
• TCP delays transmission of ACKs for up to 200ms
• The hope is to have data ready in that time frame. Then, the ACK can be piggybacked with a data segment.
• Delayed ACKs explain why the ACK and the “echo of character” are sent in the same segment.
42
Telnet session to a distant host
argon.cs.virginia.edu tenet.cs.berkeley.edu
Telnet sessionbetween argon.cs.virginia.eduandtenet.cs.berkeley.edu
• This is the output of typing nine characters :
Time 16.401963: Argon Tenet: Push, SeqNo 1:2(1), AckNo 2 Time 16.481929: Tenet Argon: Push, SeqNo 2:3(1) , AckNo 2
Time 16.482154: Argon Tenet: Push, SeqNo 2:3(1) , AckNo 3Time 16.559447: Tenet Argon: Push, SeqNo 3:4(1), AckNo 3
Time 16.559684: Argon Tenet: Push, SeqNo 3:4(1), AckNo 4 Time 16.640508: Tenet Argon: Push, SeqNo 4:5(1) AckNo 4
Time 16.640761: Argon Tenet: Push, SeqNo 4:8(4) AckNo 5 Time 16.728402: Tenet Argon: Push, SeqNo 5:9(4) AckNo 8
43
Delayed Acks do not kick in if there are data to send
• Observation: Transmission of segments follows a different pattern, i.e., there are only two packets per character typed
• The delayed acknowled-gment does not kick in
• The reason is that there is always data at Argon ready to sent when the ACK arrives.
char1
ACK of char 1 + echo of char1
ACK + char2
ACK + echo of char2
44
Nagle’s Algorithm
• Observation: – Argon never has multiple unacknowledged segments
outstanding– There are fewer transmissions than there are
characters.• Sending one byte per packet is inefficient.
• Solution: Nagle’s Algorithm
Small segments cannot be sent until outstanding data is acked.
• The algorithm can be disabled, because it could be a problem to interactive applications such as X window.
45
Flow Control
Congestion ControlTCP:
46
What is Flow/Congestion Control ?
• Flow Control: Algorithms to prevent that the sender overruns the receiver buffer
• Congestion Control: Algorithms to prevent that the sender overloads the network
Sliding window implements both control mechanisms.
47
TCP Flow Control
48
TCP Flow Control
• TCP implements sliding window flow control
• Sending acknowledgements is separated from setting the window size at sender.
•Acknowledgements do not automatically increase the window size
• Acknowledgements are cumulative
49
Sliding Window Flow Control
1 2 3 4 5 6 7 8 9 10 11
Advertised window
sent but notacknowledged can be sent
USABLEWINDOW
sent andacknowledged
can't sent
• Sliding Window Protocol is performed at the byte level:
•Here: Sender can transmit sequence numbers 6,7,8.
50
Sliding Window: “Window Opens”
• Acknowledgement is received that enlarges the window to the right (AckNo = 5, Win=6):
• A receiver opens a window when TCP buffer empties (meaning that data is delivered to the application).
•1 •2 •3 •4 •5 •6 •7 •8 •9 •10 •11
•1 •2 •3 •4 •5 •6 •7 •8 •9 •10 •11
•AckNo = 5, Win = 6•is received
51
Window Management in TCP
• The receiver is returning two parameters to the sender
• The interpretation is:• I am ready to receive new data with
SeqNo= AckNo, AckNo+1, …., AckNo+Win-1
• Receiver can acknowledge data without opening the window• Receiver can change the window size without acknowledging
data
AckNowindow size
(win)32 bits 16 bits
52
Sliding Window: Example
3K
ReceiverBuffer
0 4KSendersends 2Kof data
2K
Sendersends 2Kof data
4K
Sender blocked
53
TCP Congestion Control
54
TCP Congestion Control
• Keep a sender from congesting the network.• The sender has two internal parameters:
– Congestion Window (cwnd)– Slow-start threshhold Value (ssthresh)
• Sliding window size is set to the minimum of (cwnd, receiver advertised win)
• Congestion control works in two modes:– slow start (cwnd < ssthresh)
• Probe the available bandwidth– congestion avoidance (cwnd >= ssthresh)
• Try not to overload the network.
55
Slow Start
• Initial value: Set cwnd = 1 • Note: Unit is a segment size. TCP actually is based on bytes
and increments by 1 MSS (maximum segment size)• Modern TCP implementation may set initial cwnd to 2
• Each time an ACK is received by the sender, the congestion window is increased by 1 segment:
cwnd = cwnd + 1
• If an ACK acknowledges two segments, cwnd is still increased by only 1 segment.
• Even if ACK acknowledges a segment that is smaller than MSS bytes long, cwnd is increased by 1.
• Question: how can you accelerate your TCP download?
56
Slow Start Example
• The congestion window size grows very rapidly– For every ACK, we
increase cwnd by 1 irrespective of the number of segments ACK’ed
• TCP slows down the increase of cwnd when cwnd > ssthresh
cwnd = 1
cwnd = 2
cwnd = 4
cwnd = 7
57
Congestion Avoidance
• Congestion avoidance phase is started if cwnd has reached the slow-start threshold value
• If cwnd >= ssthresh then each time an ACK is received, increment cwnd as follows:
• cwnd = cwnd + 1/ cwnd• So cwnd is increased by one only if all cwnd segments have
been acknowledged.
58
Example of Slow Start/Congestion Avoidance
Assume that ssthresh = 8 cwnd = 1
cwnd = 2
cwnd = 4
cwnd = 8
cwnd = 9
cwnd = 10
0
2
4
6
8
10
12
14
Roundtrip times
Cw
nd
(in
seg
men
ts)
ssthresh
59
Responses to Congestion
• TCP uses packet loss as congestion signal• A TCP sender can detect lost packets via:
• Receipt of a duplicate ACK• Timeout of a retransmission timer
60
Response to Timeout
• TCP interprets a Timeout as a severe congestion signal. When a timeout occurs, the sender performs: – cwnd is reset to one:
cwnd = 1– ssthresh is set to half of the current size of the congestion
window:
ssthressh = cwnd / 2
– and slow-start is entered
61
Reaction to Duplicate ACKs
• Fast retransmit– Three duplicate ACKs indicate a packet loss– Retransmit without timeout
• Fast recovery– Avoid slow start– Retransmit “lost packet”– ssthresh = cwnd/2– cwnd = cwnd+3– Increment cwnd by one for each additional duplicate ACK
• When ACK arrives that acknowledges “new data” set:
cwnd=ssthresh
enter congestion avoidance
62
Duplicate ACK example
1. duplicate
2. duplicate
3. duplicate
63
Flavors of TCP Congestion Control
• TCP Tahoe (1988, FreeBSD 4.3 Tahoe)– Slow Start– Congestion Avoidance– Fast Retransmit
• TCP Reno (1990, FreeBSD 4.3 Reno)– Fast Recovery– Modern TCP implementation
• New Reno (1996)• SACK (1996)
64
TCP Tahoe
Thi
s pi
ctur
e is
co
pied
fro
m s
omew
her
e
65
TCP Reno (Jacobson 1990)
CASS
Fast retransmission/fast recovery
Thi
s pi
ctur
e is
co
pied
fro
m s
omew
her
e
66
TCP III – Retransmission and Timeout
67
Retransmissions in TCP
• A TCP sender retransmits a segment when it assumes that the segment has been lost:
1. No ACK has been received and a timeout occurs
2. Multiple ACKs have been received for the same segment
68
Retransmission Timer
• TCP sender maintains one retransmission timer for each connection
• When the timer reaches the retransmission timeout (RTO) value, the sender retransmits the first segment that has not been acknowledged
• The timer is started when 1. When a packet with payload is transmitted and timer is not running2. When an ACK arrives that acknowledges new data, 3. When a segment is retransmitted
• The timer is stopped when – All segments are acknowledged
69
How to set the timer
• Retransmission Timer:– The setting of the retransmission timer is crucial for good
performance of TCP– Timeout value too small results in unnecessary
retransmissions– Timeout value too large long waiting time before
a retransmission can be issued
– A problem is that the delays in the network are not fixed – Therefore, the retransmission timers must be adaptive
70
Setting the value of RTO:
• The RTO value is set based on round-trip time (RTT) measurements that each TCP performs
RTT #1
RTT #2
RTT #3
• Each TCP connection measures the time difference between the transmission of a segment and the receipt of the corresponding ACK
• There is only one measurement ongoing at any time (i.e., measurements do not overlap)
• Figure on the right shows three RTT measurements
71
Setting the RTO value
• RTO is calculated based on the RTT measurements– Uses an exponential moving average to estimate RTT (srtt)
and variance of RTT (rttvar) from – The influence of past samples decrease exponentially
• The RTT measurements are smoothed by the following estimators srtt and rttvar:
srttn+1 = RTT + (1- ) srttn
rttvarn+1 = ( | RTT - srttn | ) + (1- ) rttvarn
RTOn+1 = srttn+1 + 4 rttvarn+1
– The gains are set to =1/4 and =1/8
72
Setting the RTO value (cont’d)
• Initial value for RTO:– Sender should set the initial value of RTO to
RTO0 = 3 seconds
• RTO calculation after first RTT measurements arrived
srtt1 = RTT rttvar1 = RTT / 2
RTO1 = srtt1 + 4 rttvarn+1
• When a timeout occurs , the RTO value is doubled
RTOn+1 = max ( 2 RTOn, 64) seconds
This is called an exponential backoff
73
Karn’s Algorithm
Timeout !
RT
T ? R
TT
?
Karn’s Algorithm:• Don’t update RTT on any segments that have been retransmitted
If an ACK for a retransmitted segment is received, the sender cannot tell if the ACK belongs to the original or the retransmission.
RTT measurements is ambiguous in this case
74
Summary
• UDP: connectionless, unreliable, datagram service• TCP: reliable, connection-oriented, byte stream service
– TCP header– Connection management– Delayed ACKs and nagle’s algorithm– TCP flow control– TCP congestion control– TCP retransmission and timeout
• References– TCP/IP illustrated vol. 1, chapter11, 17-24– RFC793 (Transmission Control Protocol)– RFC768 (User Datagram Protocol)– RFC2581 (TCP Congestion control)– RFC2988 (Computing TCP’s Retransmission Timer)– RFC3390 (Increasing TCP’s Initial Window)