Upload
fabiana-fullam
View
27
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Transport Layer. Part 2. TCP Flow Control, Congestion Control, Connection Management, etc. Encapsulation in TCP/IP. IP datagram. point-to-point: one sender, one receiver reliable, in-order byte stream: no message boundaries pipelined: TCP congestion and flow control set window size - PowerPoint PPT Presentation
Citation preview
Transport Layer ndash TCP
1
BB
TCP Flow Control Congestion Control TCP Flow Control Congestion Control Connection Management etcConnection Management etc
Part 2Part 2
Transport Layer ndash TCP
2
BB
Encapsulation in TCPIP
IP datagram
Transport Layer ndash TCP
3
BB
TCP Overview
full duplex data bi-directional app data
flow in same connection MSSMSS maximum segment
size
connection-oriented handshaking (exchange
of control msgs) inits sender receiver state before data exchange
flow controlled sender will not flood
receiver with data
point-to-point one sender one receiver
reliable in-order byte stream no message boundaries
pipelined TCP congestion and flow
control set window size
send amp receive buffers
Error detection retransmission cumulative ACKs timers header fields for sequence and ACK numbers
socketdoor
TCPsend buffer
TCPreceive buffer
socketdoor
segment
applicationwrites data
applicationreads data
Transport Layer ndash TCP
4
BB
Recall
Reliable Data Transfer Mechanisms
Checksum
Timer
Sequence number
ACK NAK
Window pipelining
socketdoor
TCPsend buffer
TCPreceive buffer
socketdoor
Packet -gt
applicationwrites data
applicationreads data
- Verification of integrity of packet
- Signals necessary re-transmission is required
- Keeps track of which packet has been sent and received
- Indicates receipt of packet in good or bad form
- Allows for the sending of multiple yet-to-be-acknowledged packets
Transport Layer ndash TCP
5
BB
Internet Checksum Example Note
When adding numbers a carryout from the most significant bit needs to be added to the result
Example add two 16-bit integers
1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1
wraparound
sumchecksum
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1To check
data
1
Transport Layer ndash TCP
6
BB
Connection Oriented Transport Connection Oriented Transport TCPTCP
TCP Segment Structure SEQ and ACK numbers Calculating the Timeout Interval The Simplified TCP Sender ACK Generation Recommendation (RFC 1122 RFC 2581)
Interesting Transmission Scenarios Flow Control TCP Connection Management
Transport Layer ndash TCP
7
BB
TCP segment structureTCP segment structure
source port dest port
32 bits
applicationdata
(variable length)
sequence number
acknowledgement numberrcvr window size
URGent data ptrchecksum
FSRPAUheadlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection established
(setup tear downcommands)
bytes thercvr is willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
In practice PSH URG and the Urgent Data Pointer are not used
HeaderHeader
We can view these teeny-weenydetails using Ethereal
Transport Layer ndash TCP
8
BB
ExampleExample
Suppose that a process in Host A wants to send a stream of data to a process in Host B over a TCP connection
AssumeData stream file consisting of 500000 bytes500000 bytesMSS 1000 bytes1000 bytesFirst byte of data stream numbered as 00
TCP constructs 500 segments out of the data stream
500000 bytes1000 bytes = 500 segments
Transport Layer ndash TCP
9
BB
TCP sequence s and ACKs
Sequence Numbers (s) byte stream number of first byte in segments data Do not necessarily start from 0 use random initial number
RRbull Segment 1 0 + RRbull Segment 2 1000 + RR etc
ACKs (acknowledgment) Seq of next byte expected from other side (last byte +1) Cumulative ACK If received segment 1 waits for segment 2 Eg Ack=1000 + R (received up to 999th byte)
00 1 2 3 4 999 10001000 1001 10021999
Segment 1 Segment 2
Transport Layer ndash TCP
10
BB
TCP sequence s and ACKs
Q how receiver handles out-of-order segments A TCP specs does does
notnot say - decide when implementing
Host A Host BSeq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
Usertypes
C
host ACKsreceipt
of echoedC
host ACKsreceipt of
C echoesback C
time
simple telnet scenario (with echo on)
Assuming that the starting sequence numbers for Host A
and Host B are 4242 and 7979 respectively
Send me the bytes from 43 onwardACK is being piggy-backed onserver-to-client data
Irsquom sending data starting at seq num=42
serverclient
Transport Layer ndash TCP
11
BB
Yet another server echo Yet another server echo exampleexample
Host A Host B
Seq=42 ACK=79 data = lsquoHellorsquo
Seq=79 ACK=47 data = lsquoHellorsquo
Seq=47 ACK=84 data = lsquo200rsquo
UsertypesHello
host ACKsreceipt
of echoedHello
send something else
host ACKsreceipt ofHello
echoes back Hello
time
Host Aseq=42ack=79
seq=47ack=84
Host B
seq=79
ack=47
seq=84
ack=50
Seq=84 ACK=50 data = lsquo200rsquo
ACK tells about up to what byte ACK tells about up to what byte has been receivedhas been received and what is the and what is the
next startingnext starting byte the host is byte the host is expecting to receiveexpecting to receive
Transport Layer ndash TCP
12
BB
TCP Round Trip Time and Timeout
Q how to set TCP how to set TCP timeout valuetimeout value
longer than RTT note RTT will vary
too short premature timeout unnecessary
retransmissions too long slow
reaction to segment loss
RTT = round trip time
Q how to estimate RTThow to estimate RTT SampleRTT measured time
from segment transmission until ACK receipt ignore retransmissions
cumulatively ACKed segments
SampleRTT will vary we would want estimated RTT to be smoother use several recent use several recent
measurementsmeasurements not just current SampleRTT
Main IssueMain Issue How long is the sender willing to wait before re-transmitting the packet
Transport Layer ndash TCP
13
BB
TCP Round Trip Time and Timeout
EstimatedRTT = (1-x) EstimatedRTT + x SampleRTTEstimatedRTT = (1-x) EstimatedRTT + x SampleRTT
Exponential weighted moving average influence of given sample decreases exponentially
fast typical value of x 0125 (RFC 2988)
Setting the timeoutSetting the timeout EstimatedRTT plus safety marginsafety margin large variation in EstimatedRTT -gt larger safety margin recommended value of x 025x 025
Timeout = EstimatedRTT + (4 Deviation)Timeout = EstimatedRTT + (4 Deviation)
DeviationDeviation = (1-x) Deviation + x |SampleRTT-EstimatedRTT|
Transport Layer ndash TCP
14
BB
EstimatedRTT = 0875 EstimatedRTT + 0125 SampleRTT
EstimatedRTT after the receipt of the ACK of segment 1EstimatedRTT = RTT for Segment 1 = 002746 second
EstimatedRTT after the receipt of the ACK of segment 2EstimatedRTT = 0875 002746 + 0125 0035557 = 00285
EstimatedRTT after the receipt of the ACK of segment 3EstimatedRTT = 0875 00285 + 0125 0070059 = 00337
EstimatedRTT after the receipt of the ACK of segment 4EstimatedRTT = 0875 00337+ 0125 011443 = 00438
EstimatedRTT after the receipt of the ACK of segment 5EstimatedRTT = 0875 00438 + 0125 013989 = 00558
EstimatedRTT after the receipt of the ACK of segment 6EstimatedRTT = 0875 00558 + 0125 018964 = 00725
Sample CalculationsSample Calculations
Transport Layer ndash TCP
15
BB
RTT Samples and RTT estimatesRTT Samples and RTT estimates
300
250
200
150
100 time
Estimated RTT
Sample RTT
RT
T (
mse
c)
The variations in the The variations in the SampleRTT SampleRTT are are smoothed out in the computation of thesmoothed out in the computation of the
EstimatedRTTEstimatedRTT
Transport Layer ndash TCP
16
BB
An Actual RTT estimationAn Actual RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RT
T (m
illis
eco
nds)
SampleRTT Estimated RTT
Transport Layer ndash TCP
17
BB
Simplified TCP sender assuming
waitfor
event
waitfor
event
event data received from application above
event timer timeout for segment with seq number y
event ACK receivedwith ACK number y
create send segment
retransmit segment
process ACK
- one way data transfer- no flow congestion control
FSM of TCP for Reliable Data Transfer FSM of TCP for Reliable Data Transfer
Transport Layer ndash TCP
18
BB
00 sendbase = initial_sequence number 01 nextseqnum = initial_sequence number 0203 loop (forever) 04 switch(event) 05 event data received from application above 06 create TCP segment with sequence number nextseqnum 07 If (timer is currently not running) start timer for segment
nextseqnum 08 pass segment to IP 09 nextseqnum = nextseqnum + length(data) 10 event timer timeout 11 retransmit not-yet-ACKed segment with smallest Seq 12 Start timer13 event ACK received with ACK field value of y 15 if (y gt sendbase) cumulative ACK of all data up to y 16 sendbase = y 17 If (there are currently any not-yet-ACKed segments)18 start timer19 20 end of loop forever
Associated with the oldest unACKed
segment
SIMPLIFIED TCPSIMPLIFIED TCP SENDER
AssumptionsAssumptions bull sender is not constrained by TCP flow or congestion controlbull that data from above is less than MSS in sizebull that data transfer is in one direction only
Transport Layer ndash TCP
20
BB
00 sendbase = initial_sequence number 01 nextseqnum = initial_sequence number 0203 loop (forever) 04 switch(event) 05 event data received from application above 06 create TCP segment with sequence number nextseqnum 07 start timer for segment nextseqnum 08 pass segment to IP 09 nextseqnum = nextseqnum + length(data) 10 event timer timeout for segment with sequence number y 11 retransmit segment with sequence number y 12 compute new timeout interval for segment y 13 restart timer for sequence number y 14 event ACK received with ACK field value of y 15 if (y gt sendbase) cumulative ACK of all data up to y 16 cancel all timers for segments with sequence numbers lt y 17 sendbase = y 18 19 else a duplicate ACK for already ACKed segment 20 increment number of duplicate ACKs received for y 21 if (number of duplicate ACKS received for y is 3) 22 perform TCP fast retransmission 23 resend segment with sequence number y 24 restart timer for segment y 25 26 end of loop forever
TCPTCP with MODIFICATIONSwith MODIFICATIONS SENDER
Why wait for the timeout to expire when consecutive
ACKs can be used to indicate a lost segment
With Fast Retransmit
Transport Layer ndash TCP
21
BB
TCP ACK generation [RFC 1122 RFC 2581]
Event
in-order segment arrival no gapseverything else already ACKed
in-order segment arrival no gaps one delayed ACK pending (due to action 1)
out-of-order segment arrivalwith higher than expect seq - a gap is detected
arrival of segment that partially or completely fills gap
TCP Receiver action
Delay sending the ACK Wait up to 500msfor next segment If next segment does not arrive in this interval send ACK
immediately send a singlecumulative ACK
send duplicate ACK indicating seq of next expected byte
Immediately send an ACK if segment startsat lower end of gap
1
2
3
4
Receiver does not discard out-of-order segmentsReceiver does not discard out-of-order segments
Transport Layer ndash TCP
22
BB
TCP Interesting Scenarios
Host A
Seq=92 8 bytes data
ACK=100
loss
tim
eout
time lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
Host A
Seq=100 20 bytes data
ACK=100
Seq=
92
tim
eout
time premature timeoutcumulative ACKs
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
ACK=120
Retransmission due to lost ACKSegment with Seq=100 not retransmitted
Timer is restarted here for Seq=92
Simplified TCP versionSimplified TCP version
Transport Layer ndash TCP
23
BB
Host A
Seq=100 20 bytes data
ACK=100Seq=
92
tim
eout
time
Host B
ACK=120
Seq=92 8 bytes data
Xloss
Cumulative ACK avoids retransmission of the first segment
TCP Retransmission ScenarioTCP Retransmission Scenario
Transport Layer ndash TCP
24
BB
TCP Modifications Doubling the Timeout IntervalDoubling the Timeout Interval
Provides a limited form of congestion control
Timer expiration is more likely caused by congestion in the network
TimeoutInterval = 2 TimeoutIntervalPrevious
TCP acts more politely by increasing the TimeoutInterval causing the sender to retransmit after longer and longer intervals
Congestion may get worse if sources
continue to retransmit packets
persistently
After ACK is received After ACK is received TimeoutInterval is derived TimeoutInterval is derived from most recent EstimatedRTT from most recent EstimatedRTT and DevRTTand DevRTT
Others check RFC 2018 ndash selective ACK
Transport Layer ndash TCP
25
BB
receiverreceiver explicitly informs sender of (dynamically changing) amount of free buffer space RcvWindowRcvWindow field
in TCP segment
sendersender keeps the amount of transmitted unACKed data less thanless than most recently received RcvWindowRcvWindow
sender wont overrun
receivers buffer bytransmitting too
much too fast
flow controlflow control
receiver buffering
RcvBufferRcvBuffer = size of TCP Receive Buffer
RcvWindowRcvWindow = amount of spare room in Buffer
TCP Flow ControlTCP Flow Control
Transport Layer ndash TCP
26
BB
FLOW CONTROL FLOW CONTROL ReceiverReceiver
04060 50100
LastByteRcvd
EXAMPLE HOST A sends a large file to HOST B
LastByteRead
Application Process
Data from IP
RcvBuffer
RcvWindow=RcvBuffer-[LastByteRcvd-LastByteRead]
RECEIVER HOST B ndash uses RcvWindow LastByteRcvd LastByteReadRcvWindow LastByteRcvd LastByteRead
Initially RcvWindow = RcvBuffer Application reads from the bufferHOST BHOST B tells HOST AHOST A how much spare roomhow much spare room it has in the connection buffer by placing its current value of RcvWindowRcvWindow in the receive window field of every segment it sends to HOST AHOST A
Transport Layer ndash TCP
27
BB
FLOW CONTROL SenderSender
04060 50100
LastByteSent
EXAMPLE HOST A sends a large file to HOST B
LastByteACKed
Data
To ensure that HOST B does not overflow HOST A maintains throughout the connectionrsquos life that [LastByteSent-LastByteACKed] lt= RcvWindow
SENDER HOST A
ACKs ACKs from from Host Host BB
SENDER HOST A ndash uses RcvWindow RcvWindow of HostBof HostB LastByteSent LastByteACKed LastByteSent LastByteACKed
Transport Layer ndash TCP
28
BB
FLOW CONTROLFLOW CONTROLSome issue to consider
RcvWindowRcvWindow ndash used by the connection to provide the flow control serviceflow control service
TCPTCP requires that requires that HOST AHOST A continue to sendcontinue to send segments with one segments with one data byte when HOST Brsquos receive data byte when HOST Brsquos receive window is window is 00 Such segments will be Such segments will be ACKedACKed by HOST B by HOST B Eventually the buffer will have Eventually the buffer will have some space and the some space and the ACKsACKs will will contain contain RcvWindow gt 0RcvWindow gt 0
What happens when the receive
buffer of HOST B is full full (that is when (that is when
RcvWindow=0)RcvWindow=0)
TCP sends a segment only when there is datadata or ACKACK to send Therefore the sender must maintain the connection lsquoalivealiversquo
Transport Layer ndash TCP
29
BB
TCP Connection ManagementTCP Connection Management
Recall TCP sender receiver establish ldquoconnectionconnectionrdquo before exchanging data segments
Initialize TCP variables sequence numbers buffers flow control info (eg RcvWindow)
ClientClient is the connection initiator
In Java Socket clientSocket = new Socket(hostnameport number) connect ServerServer is contacted by client
In JavaSocket accept()
if (connectconnect(s (struct sockaddr )ampsin sizeof(sin)) = 0) printf(connect failedn) WSACleanup() exit(1)
ns = acceptaccept(s(struct sockaddr )(ampremoteaddr)ampaddrlen)
Transport Layer ndash TCP
30
BB
TCP Connection Management
Three way handshakeStep 1 clientclient end system sends TCP
SYNSYN control segment to server (executed by TCP itself)
specifies initial seq number (isn)
Step 2 serverserver end system receives SYNSYN replies with SYNACKSYNACK control segment
ACKs received SYN allocates buffers specifies serverrsquos initial seq
number
Step 3 clientclient ACKsACKs the connection with
ACK=server_isn +1ACK=server_isn +1
allocates buffers sends SYN=0SYN=0
Connection established
Client
AcceptAccept (SYN=1
seq=server_isnack=client_isn+1)
time
Server
ConnectConnect (SYN=1 seq=client_isn)
ACK (SYN=0
seq=client_isn+1ack=server_isn+1)
Establishing a connection
This is what happens when we create a socket for
connection to a server
After establishing the connection the client can receive segments with app-generated data (SYN=0)(SYN=0)
Transport Layer ndash TCP
31
BB
TCP Connection Management (cont)
Closing a connection
client closes socket
closesocket(s)closesocket(s)
Java clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
How TCP connection is established and torn down
Transport Layer ndash TCP
32
BB
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
Enters timed wait - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer ndash TCP
33
BB
TCP Connection Management (cont)
TCP client lifecycle
TCP server lifecycle
Used in case ACK gets lost It is implementation-dependent (eg 30
seconds 1 minute 2 minutes
Connection formally closes ndash all resources (eg port numbers) are
released
1
2
3
4
5
6
7
8
9
10
11
12
Transport Layer ndash TCP
34
BB
End of Flow Control and Error Control
Transport Layer ndash TCP
35
BB
Flow Control vs Congestion ControlFlow Control vs Congestion ControlSimilar actions are taken but for very different reasons
Flow ControlFlow Controlbull point-to-point traffic between sender and receiverbull speed matching service matching the rate at which the sender is sending against the rate at which the receiving application is readingbull prevents Receiver Buffer from overflowing
CongestionCongestion ndash happens when there are too many sources attempting to send data at too high a rate for the routers along the path
Congestion ControlCongestion Controlbull service that makes sure that the routers between End Systems are able to carry the offered trafficbull prevents routers from overflowing
Same course of action Throttling of the sender
Transport Layer ndash TCP
36
BB
CongestionCongestion Informally too many sources sending too
much data too fast for network to handle different from flow control Manifestations
lost packets (buffer overflow at routers) long delays (queuing in router buffers)
a top-10 problem
Principles of Congestion ControlPrinciples of Congestion Control
Transport Layer ndash TCP
37
BB
Approaches towards congestion control
End-to-end congestion control
no explicit feedback from network
congestion inferred by end-systems from observed packet loss amp delay
approach taken by TCP
Network-assisted congestion control
routers provide feedback to End Systems in the form of single bit indicating
link congestion (SNA DECbit TCPIP ECN ATM ABR)
explicit transmission rate the sender should send at
1 2
Two broad approaches towards congestion control
Transport Layer ndash TCP
38
BB
TCP Congestion ControlTCP Congestion ControlHow TCP sender limits the rate at which it sends traffic into its connection
SENDER
(Amount of unACKed data)SENDER lt min(CongWin RcvWindow)
LastByteSent - LastByteACKed
Indirectly limits the senderrsquos send rateAssumptions
bull TCP receive buffer is very large ndash no RcvWindow constraint Amt of unACKed data at sender is solely limited by CongWin
bull Packet loss delay amp packet transmission delay are negligible
Sending rate (approx) CongWinRTT
By adjusting CongWin sender can therefore adjust the
rate at which it sends data into its connection
New variable ndash Congestion
Window
Transport Layer ndash TCP
39
BB
TCP Congestion ControlTCP Congestion ControlTCP uses ACKs to trigger (ldquoclockrdquo) its increase in congestion window size ndash ldquoself-clockingrdquo
Arrival of ACKs ndash indication to the sender that all is well
1 Slow Rate
bull Congestion window will be increased at a relatively slow rate
2 High rate
bull Congestion window will be increased more quickly
Transport Layer ndash TCP
40
BB
TCP Congestion ControlTCP Congestion ControlHow TCP perceives that there is congestion on the path
ldquoLoss Eventrdquo ndash when there is excessive congestion router buffers along the path overflows causing datagrams to be dropped which in turn results in a ldquoloss eventrdquo at the sender
1 Timeout
bull no ACK is received after segment loss
2 Receipt of three duplicate ACKs
bull segment loss is followed by three ACKs received at the sender
Transport Layer ndash TCP
41
BB
TCP Congestion Control details
sender limits transmission LastByteSent-LastByteAcked
cwnd
roughly
cwnd is dynamic function of perceived network congestion
How does sender perceive congestion
loss event = timeout or 3 duplicate acks
TCP sender reduces rate (cwnd) after loss event
Three mechanisms1 AIMD
2 slow start
3 conservative after timeout events
rate = cwnd
RTT Bytessec
Transport Layer ndash TCP
42
BB
TCP congestion avoidance congestion avoidance additive increase multiplicative decreaseadditive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every RTT until
loss is detected multiplicative decrease cut cwnd in half after loss
timecwnd
co
nge
stio
n w
indo
w s
ize
saw toothbehavior probingfor bandwidth
Transport Layer ndash TCP
43
BB
TCP Slow Start Slow Start when connection begins
increase rate exponentially until first loss event initially cwndcwnd = 1 MSS double cwndcwnd every RTT done by incrementing cwndcwnd by 1 MSS by 1 MSS for every ACK received
summary initial rate is slow but ramps up exponentially fast (doubling of the sending rate every RTT)
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transport Layer ndash TCP
44
BB
Refinement inferring lossRefinement inferring loss
after 3 dup ACKsafter 3 dup ACKs cwndcwnd is cut in halfhalf window then grows
linearlylinearly butbut after timeout after timeout
eventevent cwndcwnd is is set to 1 MSS1 MSS window then grows
exponentiallyexponentially Up to a thresholdUp to a threshold
then grows linearlylinearly
3 dup ACKs 3 dup ACKs indicates network capable of delivering some segments
timeouttimeout indicates a ldquomore alarmingrdquo congestion scenario
Philosophy
Transport Layer ndash TCP
45
BB
Refinement
Q when should the exponential increase switch to linear
A when cwndcwnd gets to 1212 of its value before timeout
Implementation variable ssthresh (slow-start threshold)ssthresh (slow-start threshold) on loss eventon loss event ssthreshssthresh is set to 1212 of cwndcwnd just
before loss event
Transport Layer ndash TCP
46
BB
TCP Sender Congestion ControlSTATE EVENT TCP SENDER Congestion-
Control ActionCommentary
SLOW START (SS)
ACK receipt for previously unACKed data
CongWin = CongWin + MSSIf(CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo
Resulting in a doubling of CongWin every RTT
Congestion Avoidance (CA)
ACK receipt for previously unACKed data
CongWin = CongWin + MSS (MSSCongWin)
Additive increase resulting in increasing of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate ACK
Threshold = CongWin 2CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
SS or CA Timeout Threshold = CongWin 2CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter Slow Start
SS or CA Duplicate ACK Increment duplicate ACK count for segment being ACKed
CongWin and Threshold not changed
Transport Layer ndash TCP
47
BB
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3 MSSretransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++
duplicate ACK
cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
Transport Layer ndash TCP
48
BB
Congestion control
TCPrsquos Congestion Control Service
CLIENTSERVER
Problem Gridlock sets-in when there is packet loss due to router congestion
forces the End Systems to decrease the rate at which packets are sentdecrease the rate at which packets are sent during periods of congestion
The sending systemrsquos packet is lost due to congestion and is alerted when it stops receiving ACKs of packets sent
Transport Layer ndash TCP
49
BBTransport Layer 3-49
Macroscopic Description of TCP throughput
whatrsquos the average throughout of TCP as a function of window size and RTT ignore slow start (typically very short phases)
let W be the window size when loss occurs when window is W throughput is WRTT just after loss window drops to W2 throughput to
W2RTT Throughput increases linearly (by MSSRTT every
RTT) Average Throughput 75 WRTT
(Based on Idealised model for the steady-state dynamics of TCP)
Transport Layer 3-50
TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired
throughput of 10 Gbps requires window size W = 83333 in-flight
segments Throughput in terms of loss rate
L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)
new versions of TCP is needed for high-speed environments
LRTT
MSS221
Transport Layer ndash TCP
51
BB
TCP FairnessTCP Fairness
Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth
TCP connection 1
bottleneckrouter
capacity RR
TCP connection 2
Go to Summary of TCP Congestion Control
Transport Layer ndash TCP
52
BB
Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link
Link with transmission rate of RR
Each connection have the same MSSMSS RTTRTT
No other TCP connections or UDP datagrams traverse the shared link
Ignore slow start phase of TCP
Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)
Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing
AssumptionsAssumptions
Transport Layer ndash TCP
53
BB
Why is TCP fairWhy is TCP fair
Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput
proportionally
RR
equal bandwidth shareequal bandwidth share
Connection 1 throughput
Connect
ion 2
thro
ughput
congestion avoidance additive increaseloss decrease window by factor of 2
RR
A point on the graph depicts the amount of link bandwidth
jointly jointly consumedconsumed by
the connections
Full bandwidth utilisation line
We can viewWe can view a simulation a simulation
on thison this
View SimulationView Simulation
Transport Layer ndash TCP
54
BB
The End
The next succeeding slides are just for additional reading
Transport Layer ndash TCP
55
BB
TCP Latency ModelingTCP Latency Modeling
In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs
Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth
Multiple End Systems sharing a link
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
Loop holes in TCPLoop holes in TCP
1 TCP connection
1 TCP connection
1 TCP connection
3 TCP connections
Multithreading implementationMultithreading implementation
Transport Layer ndash TCP
56
BB
TCP latency modelingTCP latency modeling
TCP connection establishment time data transfer delay Actual data transmission time
Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent
QQ How long How long does it take to does it take to
receive an object receive an object from a Web from a Web
serverserver
No data transfer delayNo data transfer delay
Therersquos data transfer delayTherersquos data transfer delay
the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety
Transport Layer ndash TCP
57
BB
TCP Latency ModelingTCP Latency Modeling
Network is uncongested with one link between end systems of rate RR
CongWinCongWin (fixed) determines the amount of data that can be sent
No packet loss no packet corruption no retransmissions required
Header overheads are negligible
File to sendFile to send = integer number of segments of size MSS
Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times
CLIENT SERVER
FILEFILE
Initial ThresholdThreshold of TCP congestion mechanism is very big
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits
SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)
Transport Layer ndash TCP
58
BB
TCP latency ModelingTCP latency Modeling
Case 1 latency = 2RTT + OR
K = Number of Windows of data that cover the object
K = OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
Number of segmentsRounded up to the nearest integer
Assume W=4 segmentsAssume W=4 segments
eg O=256bits S=32bits W=4
Transport Layer ndash TCP
59
BB
TCP latency ModelingTCP latency Modeling
Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]
Number of Windows of data that cover the objectK= OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR
Sender has to wait for an ACK after a windowrsquos worth of data sent
STALLED STALLED PERIODPERIOD
If there are k windows sender willbe stalled (k-1) times
Transport Layer ndash TCP
60
BB
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
STALLED STALLED PERIODPERIOD
4 windows4 windows
OS=15
Transport Layer ndash TCP
61
BB
bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the
object as followsobject as follows
S
OkK k 110 222min
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1log
1logmin
12min
2
2
S
OK
S
OkkK
S
OkK k
Note
Transport Layer ndash TCP
62
BB
bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window
bull Transmission of Transmission of kkth window window ==
bull Stall Time Stall Time ==
bull Latency Latency ==
12
k
R
S
12
k
R
SRTT
R
S
RTTR
S
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1K
1
122k
k
R
SRTT
R
S
R
ORTT
Transport Layer ndash TCP
63
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
Transport Layer ndash TCP
64
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
R
S
R
SRTTP
R
ORTT P )12(2Latency
bull Closed-form expression for the latency
Transport Layer ndash TCP
65
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
2
1
RTTRO
P
encyMinimumLat
Latency
Slow start will not significantly increase latency if RTT ltlt OR
Transport Layer ndash TCP
66
BB
httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features
Transport Layer ndash TCP
2
BB
Encapsulation in TCPIP
IP datagram
Transport Layer ndash TCP
3
BB
TCP Overview
full duplex data bi-directional app data
flow in same connection MSSMSS maximum segment
size
connection-oriented handshaking (exchange
of control msgs) inits sender receiver state before data exchange
flow controlled sender will not flood
receiver with data
point-to-point one sender one receiver
reliable in-order byte stream no message boundaries
pipelined TCP congestion and flow
control set window size
send amp receive buffers
Error detection retransmission cumulative ACKs timers header fields for sequence and ACK numbers
socketdoor
TCPsend buffer
TCPreceive buffer
socketdoor
segment
applicationwrites data
applicationreads data
Transport Layer ndash TCP
4
BB
Recall
Reliable Data Transfer Mechanisms
Checksum
Timer
Sequence number
ACK NAK
Window pipelining
socketdoor
TCPsend buffer
TCPreceive buffer
socketdoor
Packet -gt
applicationwrites data
applicationreads data
- Verification of integrity of packet
- Signals necessary re-transmission is required
- Keeps track of which packet has been sent and received
- Indicates receipt of packet in good or bad form
- Allows for the sending of multiple yet-to-be-acknowledged packets
Transport Layer ndash TCP
5
BB
Internet Checksum Example Note
When adding numbers a carryout from the most significant bit needs to be added to the result
Example add two 16-bit integers
1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1
wraparound
sumchecksum
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1To check
data
1
Transport Layer ndash TCP
6
BB
Connection Oriented Transport Connection Oriented Transport TCPTCP
TCP Segment Structure SEQ and ACK numbers Calculating the Timeout Interval The Simplified TCP Sender ACK Generation Recommendation (RFC 1122 RFC 2581)
Interesting Transmission Scenarios Flow Control TCP Connection Management
Transport Layer ndash TCP
7
BB
TCP segment structureTCP segment structure
source port dest port
32 bits
applicationdata
(variable length)
sequence number
acknowledgement numberrcvr window size
URGent data ptrchecksum
FSRPAUheadlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection established
(setup tear downcommands)
bytes thercvr is willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
In practice PSH URG and the Urgent Data Pointer are not used
HeaderHeader
We can view these teeny-weenydetails using Ethereal
Transport Layer ndash TCP
8
BB
ExampleExample
Suppose that a process in Host A wants to send a stream of data to a process in Host B over a TCP connection
AssumeData stream file consisting of 500000 bytes500000 bytesMSS 1000 bytes1000 bytesFirst byte of data stream numbered as 00
TCP constructs 500 segments out of the data stream
500000 bytes1000 bytes = 500 segments
Transport Layer ndash TCP
9
BB
TCP sequence s and ACKs
Sequence Numbers (s) byte stream number of first byte in segments data Do not necessarily start from 0 use random initial number
RRbull Segment 1 0 + RRbull Segment 2 1000 + RR etc
ACKs (acknowledgment) Seq of next byte expected from other side (last byte +1) Cumulative ACK If received segment 1 waits for segment 2 Eg Ack=1000 + R (received up to 999th byte)
00 1 2 3 4 999 10001000 1001 10021999
Segment 1 Segment 2
Transport Layer ndash TCP
10
BB
TCP sequence s and ACKs
Q how receiver handles out-of-order segments A TCP specs does does
notnot say - decide when implementing
Host A Host BSeq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
Usertypes
C
host ACKsreceipt
of echoedC
host ACKsreceipt of
C echoesback C
time
simple telnet scenario (with echo on)
Assuming that the starting sequence numbers for Host A
and Host B are 4242 and 7979 respectively
Send me the bytes from 43 onwardACK is being piggy-backed onserver-to-client data
Irsquom sending data starting at seq num=42
serverclient
Transport Layer ndash TCP
11
BB
Yet another server echo Yet another server echo exampleexample
Host A Host B
Seq=42 ACK=79 data = lsquoHellorsquo
Seq=79 ACK=47 data = lsquoHellorsquo
Seq=47 ACK=84 data = lsquo200rsquo
UsertypesHello
host ACKsreceipt
of echoedHello
send something else
host ACKsreceipt ofHello
echoes back Hello
time
Host Aseq=42ack=79
seq=47ack=84
Host B
seq=79
ack=47
seq=84
ack=50
Seq=84 ACK=50 data = lsquo200rsquo
ACK tells about up to what byte ACK tells about up to what byte has been receivedhas been received and what is the and what is the
next startingnext starting byte the host is byte the host is expecting to receiveexpecting to receive
Transport Layer ndash TCP
12
BB
TCP Round Trip Time and Timeout
Q how to set TCP how to set TCP timeout valuetimeout value
longer than RTT note RTT will vary
too short premature timeout unnecessary
retransmissions too long slow
reaction to segment loss
RTT = round trip time
Q how to estimate RTThow to estimate RTT SampleRTT measured time
from segment transmission until ACK receipt ignore retransmissions
cumulatively ACKed segments
SampleRTT will vary we would want estimated RTT to be smoother use several recent use several recent
measurementsmeasurements not just current SampleRTT
Main IssueMain Issue How long is the sender willing to wait before re-transmitting the packet
Transport Layer ndash TCP
13
BB
TCP Round Trip Time and Timeout
EstimatedRTT = (1-x) EstimatedRTT + x SampleRTTEstimatedRTT = (1-x) EstimatedRTT + x SampleRTT
Exponential weighted moving average influence of given sample decreases exponentially
fast typical value of x 0125 (RFC 2988)
Setting the timeoutSetting the timeout EstimatedRTT plus safety marginsafety margin large variation in EstimatedRTT -gt larger safety margin recommended value of x 025x 025
Timeout = EstimatedRTT + (4 Deviation)Timeout = EstimatedRTT + (4 Deviation)
DeviationDeviation = (1-x) Deviation + x |SampleRTT-EstimatedRTT|
Transport Layer ndash TCP
14
BB
EstimatedRTT = 0875 EstimatedRTT + 0125 SampleRTT
EstimatedRTT after the receipt of the ACK of segment 1EstimatedRTT = RTT for Segment 1 = 002746 second
EstimatedRTT after the receipt of the ACK of segment 2EstimatedRTT = 0875 002746 + 0125 0035557 = 00285
EstimatedRTT after the receipt of the ACK of segment 3EstimatedRTT = 0875 00285 + 0125 0070059 = 00337
EstimatedRTT after the receipt of the ACK of segment 4EstimatedRTT = 0875 00337+ 0125 011443 = 00438
EstimatedRTT after the receipt of the ACK of segment 5EstimatedRTT = 0875 00438 + 0125 013989 = 00558
EstimatedRTT after the receipt of the ACK of segment 6EstimatedRTT = 0875 00558 + 0125 018964 = 00725
Sample CalculationsSample Calculations
Transport Layer ndash TCP
15
BB
RTT Samples and RTT estimatesRTT Samples and RTT estimates
300
250
200
150
100 time
Estimated RTT
Sample RTT
RT
T (
mse
c)
The variations in the The variations in the SampleRTT SampleRTT are are smoothed out in the computation of thesmoothed out in the computation of the
EstimatedRTTEstimatedRTT
Transport Layer ndash TCP
16
BB
An Actual RTT estimationAn Actual RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RT
T (m
illis
eco
nds)
SampleRTT Estimated RTT
Transport Layer ndash TCP
17
BB
Simplified TCP sender assuming
waitfor
event
waitfor
event
event data received from application above
event timer timeout for segment with seq number y
event ACK receivedwith ACK number y
create send segment
retransmit segment
process ACK
- one way data transfer- no flow congestion control
FSM of TCP for Reliable Data Transfer FSM of TCP for Reliable Data Transfer
Transport Layer ndash TCP
18
BB
00 sendbase = initial_sequence number 01 nextseqnum = initial_sequence number 0203 loop (forever) 04 switch(event) 05 event data received from application above 06 create TCP segment with sequence number nextseqnum 07 If (timer is currently not running) start timer for segment
nextseqnum 08 pass segment to IP 09 nextseqnum = nextseqnum + length(data) 10 event timer timeout 11 retransmit not-yet-ACKed segment with smallest Seq 12 Start timer13 event ACK received with ACK field value of y 15 if (y gt sendbase) cumulative ACK of all data up to y 16 sendbase = y 17 If (there are currently any not-yet-ACKed segments)18 start timer19 20 end of loop forever
Associated with the oldest unACKed
segment
SIMPLIFIED TCPSIMPLIFIED TCP SENDER
AssumptionsAssumptions bull sender is not constrained by TCP flow or congestion controlbull that data from above is less than MSS in sizebull that data transfer is in one direction only
Transport Layer ndash TCP
20
BB
00 sendbase = initial_sequence number 01 nextseqnum = initial_sequence number 0203 loop (forever) 04 switch(event) 05 event data received from application above 06 create TCP segment with sequence number nextseqnum 07 start timer for segment nextseqnum 08 pass segment to IP 09 nextseqnum = nextseqnum + length(data) 10 event timer timeout for segment with sequence number y 11 retransmit segment with sequence number y 12 compute new timeout interval for segment y 13 restart timer for sequence number y 14 event ACK received with ACK field value of y 15 if (y gt sendbase) cumulative ACK of all data up to y 16 cancel all timers for segments with sequence numbers lt y 17 sendbase = y 18 19 else a duplicate ACK for already ACKed segment 20 increment number of duplicate ACKs received for y 21 if (number of duplicate ACKS received for y is 3) 22 perform TCP fast retransmission 23 resend segment with sequence number y 24 restart timer for segment y 25 26 end of loop forever
TCPTCP with MODIFICATIONSwith MODIFICATIONS SENDER
Why wait for the timeout to expire when consecutive
ACKs can be used to indicate a lost segment
With Fast Retransmit
Transport Layer ndash TCP
21
BB
TCP ACK generation [RFC 1122 RFC 2581]
Event
in-order segment arrival no gapseverything else already ACKed
in-order segment arrival no gaps one delayed ACK pending (due to action 1)
out-of-order segment arrivalwith higher than expect seq - a gap is detected
arrival of segment that partially or completely fills gap
TCP Receiver action
Delay sending the ACK Wait up to 500msfor next segment If next segment does not arrive in this interval send ACK
immediately send a singlecumulative ACK
send duplicate ACK indicating seq of next expected byte
Immediately send an ACK if segment startsat lower end of gap
1
2
3
4
Receiver does not discard out-of-order segmentsReceiver does not discard out-of-order segments
Transport Layer ndash TCP
22
BB
TCP Interesting Scenarios
Host A
Seq=92 8 bytes data
ACK=100
loss
tim
eout
time lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
Host A
Seq=100 20 bytes data
ACK=100
Seq=
92
tim
eout
time premature timeoutcumulative ACKs
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
ACK=120
Retransmission due to lost ACKSegment with Seq=100 not retransmitted
Timer is restarted here for Seq=92
Simplified TCP versionSimplified TCP version
Transport Layer ndash TCP
23
BB
Host A
Seq=100 20 bytes data
ACK=100Seq=
92
tim
eout
time
Host B
ACK=120
Seq=92 8 bytes data
Xloss
Cumulative ACK avoids retransmission of the first segment
TCP Retransmission ScenarioTCP Retransmission Scenario
Transport Layer ndash TCP
24
BB
TCP Modifications Doubling the Timeout IntervalDoubling the Timeout Interval
Provides a limited form of congestion control
Timer expiration is more likely caused by congestion in the network
TimeoutInterval = 2 TimeoutIntervalPrevious
TCP acts more politely by increasing the TimeoutInterval causing the sender to retransmit after longer and longer intervals
Congestion may get worse if sources
continue to retransmit packets
persistently
After ACK is received After ACK is received TimeoutInterval is derived TimeoutInterval is derived from most recent EstimatedRTT from most recent EstimatedRTT and DevRTTand DevRTT
Others check RFC 2018 ndash selective ACK
Transport Layer ndash TCP
25
BB
receiverreceiver explicitly informs sender of (dynamically changing) amount of free buffer space RcvWindowRcvWindow field
in TCP segment
sendersender keeps the amount of transmitted unACKed data less thanless than most recently received RcvWindowRcvWindow
sender wont overrun
receivers buffer bytransmitting too
much too fast
flow controlflow control
receiver buffering
RcvBufferRcvBuffer = size of TCP Receive Buffer
RcvWindowRcvWindow = amount of spare room in Buffer
TCP Flow ControlTCP Flow Control
Transport Layer ndash TCP
26
BB
FLOW CONTROL FLOW CONTROL ReceiverReceiver
04060 50100
LastByteRcvd
EXAMPLE HOST A sends a large file to HOST B
LastByteRead
Application Process
Data from IP
RcvBuffer
RcvWindow=RcvBuffer-[LastByteRcvd-LastByteRead]
RECEIVER HOST B ndash uses RcvWindow LastByteRcvd LastByteReadRcvWindow LastByteRcvd LastByteRead
Initially RcvWindow = RcvBuffer Application reads from the bufferHOST BHOST B tells HOST AHOST A how much spare roomhow much spare room it has in the connection buffer by placing its current value of RcvWindowRcvWindow in the receive window field of every segment it sends to HOST AHOST A
Transport Layer ndash TCP
27
BB
FLOW CONTROL SenderSender
04060 50100
LastByteSent
EXAMPLE HOST A sends a large file to HOST B
LastByteACKed
Data
To ensure that HOST B does not overflow HOST A maintains throughout the connectionrsquos life that [LastByteSent-LastByteACKed] lt= RcvWindow
SENDER HOST A
ACKs ACKs from from Host Host BB
SENDER HOST A ndash uses RcvWindow RcvWindow of HostBof HostB LastByteSent LastByteACKed LastByteSent LastByteACKed
Transport Layer ndash TCP
28
BB
FLOW CONTROLFLOW CONTROLSome issue to consider
RcvWindowRcvWindow ndash used by the connection to provide the flow control serviceflow control service
TCPTCP requires that requires that HOST AHOST A continue to sendcontinue to send segments with one segments with one data byte when HOST Brsquos receive data byte when HOST Brsquos receive window is window is 00 Such segments will be Such segments will be ACKedACKed by HOST B by HOST B Eventually the buffer will have Eventually the buffer will have some space and the some space and the ACKsACKs will will contain contain RcvWindow gt 0RcvWindow gt 0
What happens when the receive
buffer of HOST B is full full (that is when (that is when
RcvWindow=0)RcvWindow=0)
TCP sends a segment only when there is datadata or ACKACK to send Therefore the sender must maintain the connection lsquoalivealiversquo
Transport Layer ndash TCP
29
BB
TCP Connection ManagementTCP Connection Management
Recall TCP sender receiver establish ldquoconnectionconnectionrdquo before exchanging data segments
Initialize TCP variables sequence numbers buffers flow control info (eg RcvWindow)
ClientClient is the connection initiator
In Java Socket clientSocket = new Socket(hostnameport number) connect ServerServer is contacted by client
In JavaSocket accept()
if (connectconnect(s (struct sockaddr )ampsin sizeof(sin)) = 0) printf(connect failedn) WSACleanup() exit(1)
ns = acceptaccept(s(struct sockaddr )(ampremoteaddr)ampaddrlen)
Transport Layer ndash TCP
30
BB
TCP Connection Management
Three way handshakeStep 1 clientclient end system sends TCP
SYNSYN control segment to server (executed by TCP itself)
specifies initial seq number (isn)
Step 2 serverserver end system receives SYNSYN replies with SYNACKSYNACK control segment
ACKs received SYN allocates buffers specifies serverrsquos initial seq
number
Step 3 clientclient ACKsACKs the connection with
ACK=server_isn +1ACK=server_isn +1
allocates buffers sends SYN=0SYN=0
Connection established
Client
AcceptAccept (SYN=1
seq=server_isnack=client_isn+1)
time
Server
ConnectConnect (SYN=1 seq=client_isn)
ACK (SYN=0
seq=client_isn+1ack=server_isn+1)
Establishing a connection
This is what happens when we create a socket for
connection to a server
After establishing the connection the client can receive segments with app-generated data (SYN=0)(SYN=0)
Transport Layer ndash TCP
31
BB
TCP Connection Management (cont)
Closing a connection
client closes socket
closesocket(s)closesocket(s)
Java clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
How TCP connection is established and torn down
Transport Layer ndash TCP
32
BB
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
Enters timed wait - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer ndash TCP
33
BB
TCP Connection Management (cont)
TCP client lifecycle
TCP server lifecycle
Used in case ACK gets lost It is implementation-dependent (eg 30
seconds 1 minute 2 minutes
Connection formally closes ndash all resources (eg port numbers) are
released
1
2
3
4
5
6
7
8
9
10
11
12
Transport Layer ndash TCP
34
BB
End of Flow Control and Error Control
Transport Layer ndash TCP
35
BB
Flow Control vs Congestion ControlFlow Control vs Congestion ControlSimilar actions are taken but for very different reasons
Flow ControlFlow Controlbull point-to-point traffic between sender and receiverbull speed matching service matching the rate at which the sender is sending against the rate at which the receiving application is readingbull prevents Receiver Buffer from overflowing
CongestionCongestion ndash happens when there are too many sources attempting to send data at too high a rate for the routers along the path
Congestion ControlCongestion Controlbull service that makes sure that the routers between End Systems are able to carry the offered trafficbull prevents routers from overflowing
Same course of action Throttling of the sender
Transport Layer ndash TCP
36
BB
CongestionCongestion Informally too many sources sending too
much data too fast for network to handle different from flow control Manifestations
lost packets (buffer overflow at routers) long delays (queuing in router buffers)
a top-10 problem
Principles of Congestion ControlPrinciples of Congestion Control
Transport Layer ndash TCP
37
BB
Approaches towards congestion control
End-to-end congestion control
no explicit feedback from network
congestion inferred by end-systems from observed packet loss amp delay
approach taken by TCP
Network-assisted congestion control
routers provide feedback to End Systems in the form of single bit indicating
link congestion (SNA DECbit TCPIP ECN ATM ABR)
explicit transmission rate the sender should send at
1 2
Two broad approaches towards congestion control
Transport Layer ndash TCP
38
BB
TCP Congestion ControlTCP Congestion ControlHow TCP sender limits the rate at which it sends traffic into its connection
SENDER
(Amount of unACKed data)SENDER lt min(CongWin RcvWindow)
LastByteSent - LastByteACKed
Indirectly limits the senderrsquos send rateAssumptions
bull TCP receive buffer is very large ndash no RcvWindow constraint Amt of unACKed data at sender is solely limited by CongWin
bull Packet loss delay amp packet transmission delay are negligible
Sending rate (approx) CongWinRTT
By adjusting CongWin sender can therefore adjust the
rate at which it sends data into its connection
New variable ndash Congestion
Window
Transport Layer ndash TCP
39
BB
TCP Congestion ControlTCP Congestion ControlTCP uses ACKs to trigger (ldquoclockrdquo) its increase in congestion window size ndash ldquoself-clockingrdquo
Arrival of ACKs ndash indication to the sender that all is well
1 Slow Rate
bull Congestion window will be increased at a relatively slow rate
2 High rate
bull Congestion window will be increased more quickly
Transport Layer ndash TCP
40
BB
TCP Congestion ControlTCP Congestion ControlHow TCP perceives that there is congestion on the path
ldquoLoss Eventrdquo ndash when there is excessive congestion router buffers along the path overflows causing datagrams to be dropped which in turn results in a ldquoloss eventrdquo at the sender
1 Timeout
bull no ACK is received after segment loss
2 Receipt of three duplicate ACKs
bull segment loss is followed by three ACKs received at the sender
Transport Layer ndash TCP
41
BB
TCP Congestion Control details
sender limits transmission LastByteSent-LastByteAcked
cwnd
roughly
cwnd is dynamic function of perceived network congestion
How does sender perceive congestion
loss event = timeout or 3 duplicate acks
TCP sender reduces rate (cwnd) after loss event
Three mechanisms1 AIMD
2 slow start
3 conservative after timeout events
rate = cwnd
RTT Bytessec
Transport Layer ndash TCP
42
BB
TCP congestion avoidance congestion avoidance additive increase multiplicative decreaseadditive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every RTT until
loss is detected multiplicative decrease cut cwnd in half after loss
timecwnd
co
nge
stio
n w
indo
w s
ize
saw toothbehavior probingfor bandwidth
Transport Layer ndash TCP
43
BB
TCP Slow Start Slow Start when connection begins
increase rate exponentially until first loss event initially cwndcwnd = 1 MSS double cwndcwnd every RTT done by incrementing cwndcwnd by 1 MSS by 1 MSS for every ACK received
summary initial rate is slow but ramps up exponentially fast (doubling of the sending rate every RTT)
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transport Layer ndash TCP
44
BB
Refinement inferring lossRefinement inferring loss
after 3 dup ACKsafter 3 dup ACKs cwndcwnd is cut in halfhalf window then grows
linearlylinearly butbut after timeout after timeout
eventevent cwndcwnd is is set to 1 MSS1 MSS window then grows
exponentiallyexponentially Up to a thresholdUp to a threshold
then grows linearlylinearly
3 dup ACKs 3 dup ACKs indicates network capable of delivering some segments
timeouttimeout indicates a ldquomore alarmingrdquo congestion scenario
Philosophy
Transport Layer ndash TCP
45
BB
Refinement
Q when should the exponential increase switch to linear
A when cwndcwnd gets to 1212 of its value before timeout
Implementation variable ssthresh (slow-start threshold)ssthresh (slow-start threshold) on loss eventon loss event ssthreshssthresh is set to 1212 of cwndcwnd just
before loss event
Transport Layer ndash TCP
46
BB
TCP Sender Congestion ControlSTATE EVENT TCP SENDER Congestion-
Control ActionCommentary
SLOW START (SS)
ACK receipt for previously unACKed data
CongWin = CongWin + MSSIf(CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo
Resulting in a doubling of CongWin every RTT
Congestion Avoidance (CA)
ACK receipt for previously unACKed data
CongWin = CongWin + MSS (MSSCongWin)
Additive increase resulting in increasing of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate ACK
Threshold = CongWin 2CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
SS or CA Timeout Threshold = CongWin 2CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter Slow Start
SS or CA Duplicate ACK Increment duplicate ACK count for segment being ACKed
CongWin and Threshold not changed
Transport Layer ndash TCP
47
BB
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3 MSSretransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++
duplicate ACK
cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
Transport Layer ndash TCP
48
BB
Congestion control
TCPrsquos Congestion Control Service
CLIENTSERVER
Problem Gridlock sets-in when there is packet loss due to router congestion
forces the End Systems to decrease the rate at which packets are sentdecrease the rate at which packets are sent during periods of congestion
The sending systemrsquos packet is lost due to congestion and is alerted when it stops receiving ACKs of packets sent
Transport Layer ndash TCP
49
BBTransport Layer 3-49
Macroscopic Description of TCP throughput
whatrsquos the average throughout of TCP as a function of window size and RTT ignore slow start (typically very short phases)
let W be the window size when loss occurs when window is W throughput is WRTT just after loss window drops to W2 throughput to
W2RTT Throughput increases linearly (by MSSRTT every
RTT) Average Throughput 75 WRTT
(Based on Idealised model for the steady-state dynamics of TCP)
Transport Layer 3-50
TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired
throughput of 10 Gbps requires window size W = 83333 in-flight
segments Throughput in terms of loss rate
L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)
new versions of TCP is needed for high-speed environments
LRTT
MSS221
Transport Layer ndash TCP
51
BB
TCP FairnessTCP Fairness
Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth
TCP connection 1
bottleneckrouter
capacity RR
TCP connection 2
Go to Summary of TCP Congestion Control
Transport Layer ndash TCP
52
BB
Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link
Link with transmission rate of RR
Each connection have the same MSSMSS RTTRTT
No other TCP connections or UDP datagrams traverse the shared link
Ignore slow start phase of TCP
Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)
Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing
AssumptionsAssumptions
Transport Layer ndash TCP
53
BB
Why is TCP fairWhy is TCP fair
Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput
proportionally
RR
equal bandwidth shareequal bandwidth share
Connection 1 throughput
Connect
ion 2
thro
ughput
congestion avoidance additive increaseloss decrease window by factor of 2
RR
A point on the graph depicts the amount of link bandwidth
jointly jointly consumedconsumed by
the connections
Full bandwidth utilisation line
We can viewWe can view a simulation a simulation
on thison this
View SimulationView Simulation
Transport Layer ndash TCP
54
BB
The End
The next succeeding slides are just for additional reading
Transport Layer ndash TCP
55
BB
TCP Latency ModelingTCP Latency Modeling
In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs
Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth
Multiple End Systems sharing a link
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
Loop holes in TCPLoop holes in TCP
1 TCP connection
1 TCP connection
1 TCP connection
3 TCP connections
Multithreading implementationMultithreading implementation
Transport Layer ndash TCP
56
BB
TCP latency modelingTCP latency modeling
TCP connection establishment time data transfer delay Actual data transmission time
Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent
QQ How long How long does it take to does it take to
receive an object receive an object from a Web from a Web
serverserver
No data transfer delayNo data transfer delay
Therersquos data transfer delayTherersquos data transfer delay
the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety
Transport Layer ndash TCP
57
BB
TCP Latency ModelingTCP Latency Modeling
Network is uncongested with one link between end systems of rate RR
CongWinCongWin (fixed) determines the amount of data that can be sent
No packet loss no packet corruption no retransmissions required
Header overheads are negligible
File to sendFile to send = integer number of segments of size MSS
Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times
CLIENT SERVER
FILEFILE
Initial ThresholdThreshold of TCP congestion mechanism is very big
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits
SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)
Transport Layer ndash TCP
58
BB
TCP latency ModelingTCP latency Modeling
Case 1 latency = 2RTT + OR
K = Number of Windows of data that cover the object
K = OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
Number of segmentsRounded up to the nearest integer
Assume W=4 segmentsAssume W=4 segments
eg O=256bits S=32bits W=4
Transport Layer ndash TCP
59
BB
TCP latency ModelingTCP latency Modeling
Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]
Number of Windows of data that cover the objectK= OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR
Sender has to wait for an ACK after a windowrsquos worth of data sent
STALLED STALLED PERIODPERIOD
If there are k windows sender willbe stalled (k-1) times
Transport Layer ndash TCP
60
BB
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
STALLED STALLED PERIODPERIOD
4 windows4 windows
OS=15
Transport Layer ndash TCP
61
BB
bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the
object as followsobject as follows
S
OkK k 110 222min
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1log
1logmin
12min
2
2
S
OK
S
OkkK
S
OkK k
Note
Transport Layer ndash TCP
62
BB
bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window
bull Transmission of Transmission of kkth window window ==
bull Stall Time Stall Time ==
bull Latency Latency ==
12
k
R
S
12
k
R
SRTT
R
S
RTTR
S
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1K
1
122k
k
R
SRTT
R
S
R
ORTT
Transport Layer ndash TCP
63
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
Transport Layer ndash TCP
64
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
R
S
R
SRTTP
R
ORTT P )12(2Latency
bull Closed-form expression for the latency
Transport Layer ndash TCP
65
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
2
1
RTTRO
P
encyMinimumLat
Latency
Slow start will not significantly increase latency if RTT ltlt OR
Transport Layer ndash TCP
66
BB
httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features
Transport Layer ndash TCP
3
BB
TCP Overview
full duplex data bi-directional app data
flow in same connection MSSMSS maximum segment
size
connection-oriented handshaking (exchange
of control msgs) inits sender receiver state before data exchange
flow controlled sender will not flood
receiver with data
point-to-point one sender one receiver
reliable in-order byte stream no message boundaries
pipelined TCP congestion and flow
control set window size
send amp receive buffers
Error detection retransmission cumulative ACKs timers header fields for sequence and ACK numbers
socketdoor
TCPsend buffer
TCPreceive buffer
socketdoor
segment
applicationwrites data
applicationreads data
Transport Layer ndash TCP
4
BB
Recall
Reliable Data Transfer Mechanisms
Checksum
Timer
Sequence number
ACK NAK
Window pipelining
socketdoor
TCPsend buffer
TCPreceive buffer
socketdoor
Packet -gt
applicationwrites data
applicationreads data
- Verification of integrity of packet
- Signals necessary re-transmission is required
- Keeps track of which packet has been sent and received
- Indicates receipt of packet in good or bad form
- Allows for the sending of multiple yet-to-be-acknowledged packets
Transport Layer ndash TCP
5
BB
Internet Checksum Example Note
When adding numbers a carryout from the most significant bit needs to be added to the result
Example add two 16-bit integers
1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1
wraparound
sumchecksum
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1To check
data
1
Transport Layer ndash TCP
6
BB
Connection Oriented Transport Connection Oriented Transport TCPTCP
TCP Segment Structure SEQ and ACK numbers Calculating the Timeout Interval The Simplified TCP Sender ACK Generation Recommendation (RFC 1122 RFC 2581)
Interesting Transmission Scenarios Flow Control TCP Connection Management
Transport Layer ndash TCP
7
BB
TCP segment structureTCP segment structure
source port dest port
32 bits
applicationdata
(variable length)
sequence number
acknowledgement numberrcvr window size
URGent data ptrchecksum
FSRPAUheadlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection established
(setup tear downcommands)
bytes thercvr is willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
In practice PSH URG and the Urgent Data Pointer are not used
HeaderHeader
We can view these teeny-weenydetails using Ethereal
Transport Layer ndash TCP
8
BB
ExampleExample
Suppose that a process in Host A wants to send a stream of data to a process in Host B over a TCP connection
AssumeData stream file consisting of 500000 bytes500000 bytesMSS 1000 bytes1000 bytesFirst byte of data stream numbered as 00
TCP constructs 500 segments out of the data stream
500000 bytes1000 bytes = 500 segments
Transport Layer ndash TCP
9
BB
TCP sequence s and ACKs
Sequence Numbers (s) byte stream number of first byte in segments data Do not necessarily start from 0 use random initial number
RRbull Segment 1 0 + RRbull Segment 2 1000 + RR etc
ACKs (acknowledgment) Seq of next byte expected from other side (last byte +1) Cumulative ACK If received segment 1 waits for segment 2 Eg Ack=1000 + R (received up to 999th byte)
00 1 2 3 4 999 10001000 1001 10021999
Segment 1 Segment 2
Transport Layer ndash TCP
10
BB
TCP sequence s and ACKs
Q how receiver handles out-of-order segments A TCP specs does does
notnot say - decide when implementing
Host A Host BSeq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
Usertypes
C
host ACKsreceipt
of echoedC
host ACKsreceipt of
C echoesback C
time
simple telnet scenario (with echo on)
Assuming that the starting sequence numbers for Host A
and Host B are 4242 and 7979 respectively
Send me the bytes from 43 onwardACK is being piggy-backed onserver-to-client data
Irsquom sending data starting at seq num=42
serverclient
Transport Layer ndash TCP
11
BB
Yet another server echo Yet another server echo exampleexample
Host A Host B
Seq=42 ACK=79 data = lsquoHellorsquo
Seq=79 ACK=47 data = lsquoHellorsquo
Seq=47 ACK=84 data = lsquo200rsquo
UsertypesHello
host ACKsreceipt
of echoedHello
send something else
host ACKsreceipt ofHello
echoes back Hello
time
Host Aseq=42ack=79
seq=47ack=84
Host B
seq=79
ack=47
seq=84
ack=50
Seq=84 ACK=50 data = lsquo200rsquo
ACK tells about up to what byte ACK tells about up to what byte has been receivedhas been received and what is the and what is the
next startingnext starting byte the host is byte the host is expecting to receiveexpecting to receive
Transport Layer ndash TCP
12
BB
TCP Round Trip Time and Timeout
Q how to set TCP how to set TCP timeout valuetimeout value
longer than RTT note RTT will vary
too short premature timeout unnecessary
retransmissions too long slow
reaction to segment loss
RTT = round trip time
Q how to estimate RTThow to estimate RTT SampleRTT measured time
from segment transmission until ACK receipt ignore retransmissions
cumulatively ACKed segments
SampleRTT will vary we would want estimated RTT to be smoother use several recent use several recent
measurementsmeasurements not just current SampleRTT
Main IssueMain Issue How long is the sender willing to wait before re-transmitting the packet
Transport Layer ndash TCP
13
BB
TCP Round Trip Time and Timeout
EstimatedRTT = (1-x) EstimatedRTT + x SampleRTTEstimatedRTT = (1-x) EstimatedRTT + x SampleRTT
Exponential weighted moving average influence of given sample decreases exponentially
fast typical value of x 0125 (RFC 2988)
Setting the timeoutSetting the timeout EstimatedRTT plus safety marginsafety margin large variation in EstimatedRTT -gt larger safety margin recommended value of x 025x 025
Timeout = EstimatedRTT + (4 Deviation)Timeout = EstimatedRTT + (4 Deviation)
DeviationDeviation = (1-x) Deviation + x |SampleRTT-EstimatedRTT|
Transport Layer ndash TCP
14
BB
EstimatedRTT = 0875 EstimatedRTT + 0125 SampleRTT
EstimatedRTT after the receipt of the ACK of segment 1EstimatedRTT = RTT for Segment 1 = 002746 second
EstimatedRTT after the receipt of the ACK of segment 2EstimatedRTT = 0875 002746 + 0125 0035557 = 00285
EstimatedRTT after the receipt of the ACK of segment 3EstimatedRTT = 0875 00285 + 0125 0070059 = 00337
EstimatedRTT after the receipt of the ACK of segment 4EstimatedRTT = 0875 00337+ 0125 011443 = 00438
EstimatedRTT after the receipt of the ACK of segment 5EstimatedRTT = 0875 00438 + 0125 013989 = 00558
EstimatedRTT after the receipt of the ACK of segment 6EstimatedRTT = 0875 00558 + 0125 018964 = 00725
Sample CalculationsSample Calculations
Transport Layer ndash TCP
15
BB
RTT Samples and RTT estimatesRTT Samples and RTT estimates
300
250
200
150
100 time
Estimated RTT
Sample RTT
RT
T (
mse
c)
The variations in the The variations in the SampleRTT SampleRTT are are smoothed out in the computation of thesmoothed out in the computation of the
EstimatedRTTEstimatedRTT
Transport Layer ndash TCP
16
BB
An Actual RTT estimationAn Actual RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RT
T (m
illis
eco
nds)
SampleRTT Estimated RTT
Transport Layer ndash TCP
17
BB
Simplified TCP sender assuming
waitfor
event
waitfor
event
event data received from application above
event timer timeout for segment with seq number y
event ACK receivedwith ACK number y
create send segment
retransmit segment
process ACK
- one way data transfer- no flow congestion control
FSM of TCP for Reliable Data Transfer FSM of TCP for Reliable Data Transfer
Transport Layer ndash TCP
18
BB
00 sendbase = initial_sequence number 01 nextseqnum = initial_sequence number 0203 loop (forever) 04 switch(event) 05 event data received from application above 06 create TCP segment with sequence number nextseqnum 07 If (timer is currently not running) start timer for segment
nextseqnum 08 pass segment to IP 09 nextseqnum = nextseqnum + length(data) 10 event timer timeout 11 retransmit not-yet-ACKed segment with smallest Seq 12 Start timer13 event ACK received with ACK field value of y 15 if (y gt sendbase) cumulative ACK of all data up to y 16 sendbase = y 17 If (there are currently any not-yet-ACKed segments)18 start timer19 20 end of loop forever
Associated with the oldest unACKed
segment
SIMPLIFIED TCPSIMPLIFIED TCP SENDER
AssumptionsAssumptions bull sender is not constrained by TCP flow or congestion controlbull that data from above is less than MSS in sizebull that data transfer is in one direction only
Transport Layer ndash TCP
20
BB
00 sendbase = initial_sequence number 01 nextseqnum = initial_sequence number 0203 loop (forever) 04 switch(event) 05 event data received from application above 06 create TCP segment with sequence number nextseqnum 07 start timer for segment nextseqnum 08 pass segment to IP 09 nextseqnum = nextseqnum + length(data) 10 event timer timeout for segment with sequence number y 11 retransmit segment with sequence number y 12 compute new timeout interval for segment y 13 restart timer for sequence number y 14 event ACK received with ACK field value of y 15 if (y gt sendbase) cumulative ACK of all data up to y 16 cancel all timers for segments with sequence numbers lt y 17 sendbase = y 18 19 else a duplicate ACK for already ACKed segment 20 increment number of duplicate ACKs received for y 21 if (number of duplicate ACKS received for y is 3) 22 perform TCP fast retransmission 23 resend segment with sequence number y 24 restart timer for segment y 25 26 end of loop forever
TCPTCP with MODIFICATIONSwith MODIFICATIONS SENDER
Why wait for the timeout to expire when consecutive
ACKs can be used to indicate a lost segment
With Fast Retransmit
Transport Layer ndash TCP
21
BB
TCP ACK generation [RFC 1122 RFC 2581]
Event
in-order segment arrival no gapseverything else already ACKed
in-order segment arrival no gaps one delayed ACK pending (due to action 1)
out-of-order segment arrivalwith higher than expect seq - a gap is detected
arrival of segment that partially or completely fills gap
TCP Receiver action
Delay sending the ACK Wait up to 500msfor next segment If next segment does not arrive in this interval send ACK
immediately send a singlecumulative ACK
send duplicate ACK indicating seq of next expected byte
Immediately send an ACK if segment startsat lower end of gap
1
2
3
4
Receiver does not discard out-of-order segmentsReceiver does not discard out-of-order segments
Transport Layer ndash TCP
22
BB
TCP Interesting Scenarios
Host A
Seq=92 8 bytes data
ACK=100
loss
tim
eout
time lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
Host A
Seq=100 20 bytes data
ACK=100
Seq=
92
tim
eout
time premature timeoutcumulative ACKs
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
ACK=120
Retransmission due to lost ACKSegment with Seq=100 not retransmitted
Timer is restarted here for Seq=92
Simplified TCP versionSimplified TCP version
Transport Layer ndash TCP
23
BB
Host A
Seq=100 20 bytes data
ACK=100Seq=
92
tim
eout
time
Host B
ACK=120
Seq=92 8 bytes data
Xloss
Cumulative ACK avoids retransmission of the first segment
TCP Retransmission ScenarioTCP Retransmission Scenario
Transport Layer ndash TCP
24
BB
TCP Modifications Doubling the Timeout IntervalDoubling the Timeout Interval
Provides a limited form of congestion control
Timer expiration is more likely caused by congestion in the network
TimeoutInterval = 2 TimeoutIntervalPrevious
TCP acts more politely by increasing the TimeoutInterval causing the sender to retransmit after longer and longer intervals
Congestion may get worse if sources
continue to retransmit packets
persistently
After ACK is received After ACK is received TimeoutInterval is derived TimeoutInterval is derived from most recent EstimatedRTT from most recent EstimatedRTT and DevRTTand DevRTT
Others check RFC 2018 ndash selective ACK
Transport Layer ndash TCP
25
BB
receiverreceiver explicitly informs sender of (dynamically changing) amount of free buffer space RcvWindowRcvWindow field
in TCP segment
sendersender keeps the amount of transmitted unACKed data less thanless than most recently received RcvWindowRcvWindow
sender wont overrun
receivers buffer bytransmitting too
much too fast
flow controlflow control
receiver buffering
RcvBufferRcvBuffer = size of TCP Receive Buffer
RcvWindowRcvWindow = amount of spare room in Buffer
TCP Flow ControlTCP Flow Control
Transport Layer ndash TCP
26
BB
FLOW CONTROL FLOW CONTROL ReceiverReceiver
04060 50100
LastByteRcvd
EXAMPLE HOST A sends a large file to HOST B
LastByteRead
Application Process
Data from IP
RcvBuffer
RcvWindow=RcvBuffer-[LastByteRcvd-LastByteRead]
RECEIVER HOST B ndash uses RcvWindow LastByteRcvd LastByteReadRcvWindow LastByteRcvd LastByteRead
Initially RcvWindow = RcvBuffer Application reads from the bufferHOST BHOST B tells HOST AHOST A how much spare roomhow much spare room it has in the connection buffer by placing its current value of RcvWindowRcvWindow in the receive window field of every segment it sends to HOST AHOST A
Transport Layer ndash TCP
27
BB
FLOW CONTROL SenderSender
04060 50100
LastByteSent
EXAMPLE HOST A sends a large file to HOST B
LastByteACKed
Data
To ensure that HOST B does not overflow HOST A maintains throughout the connectionrsquos life that [LastByteSent-LastByteACKed] lt= RcvWindow
SENDER HOST A
ACKs ACKs from from Host Host BB
SENDER HOST A ndash uses RcvWindow RcvWindow of HostBof HostB LastByteSent LastByteACKed LastByteSent LastByteACKed
Transport Layer ndash TCP
28
BB
FLOW CONTROLFLOW CONTROLSome issue to consider
RcvWindowRcvWindow ndash used by the connection to provide the flow control serviceflow control service
TCPTCP requires that requires that HOST AHOST A continue to sendcontinue to send segments with one segments with one data byte when HOST Brsquos receive data byte when HOST Brsquos receive window is window is 00 Such segments will be Such segments will be ACKedACKed by HOST B by HOST B Eventually the buffer will have Eventually the buffer will have some space and the some space and the ACKsACKs will will contain contain RcvWindow gt 0RcvWindow gt 0
What happens when the receive
buffer of HOST B is full full (that is when (that is when
RcvWindow=0)RcvWindow=0)
TCP sends a segment only when there is datadata or ACKACK to send Therefore the sender must maintain the connection lsquoalivealiversquo
Transport Layer ndash TCP
29
BB
TCP Connection ManagementTCP Connection Management
Recall TCP sender receiver establish ldquoconnectionconnectionrdquo before exchanging data segments
Initialize TCP variables sequence numbers buffers flow control info (eg RcvWindow)
ClientClient is the connection initiator
In Java Socket clientSocket = new Socket(hostnameport number) connect ServerServer is contacted by client
In JavaSocket accept()
if (connectconnect(s (struct sockaddr )ampsin sizeof(sin)) = 0) printf(connect failedn) WSACleanup() exit(1)
ns = acceptaccept(s(struct sockaddr )(ampremoteaddr)ampaddrlen)
Transport Layer ndash TCP
30
BB
TCP Connection Management
Three way handshakeStep 1 clientclient end system sends TCP
SYNSYN control segment to server (executed by TCP itself)
specifies initial seq number (isn)
Step 2 serverserver end system receives SYNSYN replies with SYNACKSYNACK control segment
ACKs received SYN allocates buffers specifies serverrsquos initial seq
number
Step 3 clientclient ACKsACKs the connection with
ACK=server_isn +1ACK=server_isn +1
allocates buffers sends SYN=0SYN=0
Connection established
Client
AcceptAccept (SYN=1
seq=server_isnack=client_isn+1)
time
Server
ConnectConnect (SYN=1 seq=client_isn)
ACK (SYN=0
seq=client_isn+1ack=server_isn+1)
Establishing a connection
This is what happens when we create a socket for
connection to a server
After establishing the connection the client can receive segments with app-generated data (SYN=0)(SYN=0)
Transport Layer ndash TCP
31
BB
TCP Connection Management (cont)
Closing a connection
client closes socket
closesocket(s)closesocket(s)
Java clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
How TCP connection is established and torn down
Transport Layer ndash TCP
32
BB
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
Enters timed wait - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer ndash TCP
33
BB
TCP Connection Management (cont)
TCP client lifecycle
TCP server lifecycle
Used in case ACK gets lost It is implementation-dependent (eg 30
seconds 1 minute 2 minutes
Connection formally closes ndash all resources (eg port numbers) are
released
1
2
3
4
5
6
7
8
9
10
11
12
Transport Layer ndash TCP
34
BB
End of Flow Control and Error Control
Transport Layer ndash TCP
35
BB
Flow Control vs Congestion ControlFlow Control vs Congestion ControlSimilar actions are taken but for very different reasons
Flow ControlFlow Controlbull point-to-point traffic between sender and receiverbull speed matching service matching the rate at which the sender is sending against the rate at which the receiving application is readingbull prevents Receiver Buffer from overflowing
CongestionCongestion ndash happens when there are too many sources attempting to send data at too high a rate for the routers along the path
Congestion ControlCongestion Controlbull service that makes sure that the routers between End Systems are able to carry the offered trafficbull prevents routers from overflowing
Same course of action Throttling of the sender
Transport Layer ndash TCP
36
BB
CongestionCongestion Informally too many sources sending too
much data too fast for network to handle different from flow control Manifestations
lost packets (buffer overflow at routers) long delays (queuing in router buffers)
a top-10 problem
Principles of Congestion ControlPrinciples of Congestion Control
Transport Layer ndash TCP
37
BB
Approaches towards congestion control
End-to-end congestion control
no explicit feedback from network
congestion inferred by end-systems from observed packet loss amp delay
approach taken by TCP
Network-assisted congestion control
routers provide feedback to End Systems in the form of single bit indicating
link congestion (SNA DECbit TCPIP ECN ATM ABR)
explicit transmission rate the sender should send at
1 2
Two broad approaches towards congestion control
Transport Layer ndash TCP
38
BB
TCP Congestion ControlTCP Congestion ControlHow TCP sender limits the rate at which it sends traffic into its connection
SENDER
(Amount of unACKed data)SENDER lt min(CongWin RcvWindow)
LastByteSent - LastByteACKed
Indirectly limits the senderrsquos send rateAssumptions
bull TCP receive buffer is very large ndash no RcvWindow constraint Amt of unACKed data at sender is solely limited by CongWin
bull Packet loss delay amp packet transmission delay are negligible
Sending rate (approx) CongWinRTT
By adjusting CongWin sender can therefore adjust the
rate at which it sends data into its connection
New variable ndash Congestion
Window
Transport Layer ndash TCP
39
BB
TCP Congestion ControlTCP Congestion ControlTCP uses ACKs to trigger (ldquoclockrdquo) its increase in congestion window size ndash ldquoself-clockingrdquo
Arrival of ACKs ndash indication to the sender that all is well
1 Slow Rate
bull Congestion window will be increased at a relatively slow rate
2 High rate
bull Congestion window will be increased more quickly
Transport Layer ndash TCP
40
BB
TCP Congestion ControlTCP Congestion ControlHow TCP perceives that there is congestion on the path
ldquoLoss Eventrdquo ndash when there is excessive congestion router buffers along the path overflows causing datagrams to be dropped which in turn results in a ldquoloss eventrdquo at the sender
1 Timeout
bull no ACK is received after segment loss
2 Receipt of three duplicate ACKs
bull segment loss is followed by three ACKs received at the sender
Transport Layer ndash TCP
41
BB
TCP Congestion Control details
sender limits transmission LastByteSent-LastByteAcked
cwnd
roughly
cwnd is dynamic function of perceived network congestion
How does sender perceive congestion
loss event = timeout or 3 duplicate acks
TCP sender reduces rate (cwnd) after loss event
Three mechanisms1 AIMD
2 slow start
3 conservative after timeout events
rate = cwnd
RTT Bytessec
Transport Layer ndash TCP
42
BB
TCP congestion avoidance congestion avoidance additive increase multiplicative decreaseadditive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every RTT until
loss is detected multiplicative decrease cut cwnd in half after loss
timecwnd
co
nge
stio
n w
indo
w s
ize
saw toothbehavior probingfor bandwidth
Transport Layer ndash TCP
43
BB
TCP Slow Start Slow Start when connection begins
increase rate exponentially until first loss event initially cwndcwnd = 1 MSS double cwndcwnd every RTT done by incrementing cwndcwnd by 1 MSS by 1 MSS for every ACK received
summary initial rate is slow but ramps up exponentially fast (doubling of the sending rate every RTT)
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transport Layer ndash TCP
44
BB
Refinement inferring lossRefinement inferring loss
after 3 dup ACKsafter 3 dup ACKs cwndcwnd is cut in halfhalf window then grows
linearlylinearly butbut after timeout after timeout
eventevent cwndcwnd is is set to 1 MSS1 MSS window then grows
exponentiallyexponentially Up to a thresholdUp to a threshold
then grows linearlylinearly
3 dup ACKs 3 dup ACKs indicates network capable of delivering some segments
timeouttimeout indicates a ldquomore alarmingrdquo congestion scenario
Philosophy
Transport Layer ndash TCP
45
BB
Refinement
Q when should the exponential increase switch to linear
A when cwndcwnd gets to 1212 of its value before timeout
Implementation variable ssthresh (slow-start threshold)ssthresh (slow-start threshold) on loss eventon loss event ssthreshssthresh is set to 1212 of cwndcwnd just
before loss event
Transport Layer ndash TCP
46
BB
TCP Sender Congestion ControlSTATE EVENT TCP SENDER Congestion-
Control ActionCommentary
SLOW START (SS)
ACK receipt for previously unACKed data
CongWin = CongWin + MSSIf(CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo
Resulting in a doubling of CongWin every RTT
Congestion Avoidance (CA)
ACK receipt for previously unACKed data
CongWin = CongWin + MSS (MSSCongWin)
Additive increase resulting in increasing of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate ACK
Threshold = CongWin 2CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
SS or CA Timeout Threshold = CongWin 2CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter Slow Start
SS or CA Duplicate ACK Increment duplicate ACK count for segment being ACKed
CongWin and Threshold not changed
Transport Layer ndash TCP
47
BB
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3 MSSretransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++
duplicate ACK
cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
Transport Layer ndash TCP
48
BB
Congestion control
TCPrsquos Congestion Control Service
CLIENTSERVER
Problem Gridlock sets-in when there is packet loss due to router congestion
forces the End Systems to decrease the rate at which packets are sentdecrease the rate at which packets are sent during periods of congestion
The sending systemrsquos packet is lost due to congestion and is alerted when it stops receiving ACKs of packets sent
Transport Layer ndash TCP
49
BBTransport Layer 3-49
Macroscopic Description of TCP throughput
whatrsquos the average throughout of TCP as a function of window size and RTT ignore slow start (typically very short phases)
let W be the window size when loss occurs when window is W throughput is WRTT just after loss window drops to W2 throughput to
W2RTT Throughput increases linearly (by MSSRTT every
RTT) Average Throughput 75 WRTT
(Based on Idealised model for the steady-state dynamics of TCP)
Transport Layer 3-50
TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired
throughput of 10 Gbps requires window size W = 83333 in-flight
segments Throughput in terms of loss rate
L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)
new versions of TCP is needed for high-speed environments
LRTT
MSS221
Transport Layer ndash TCP
51
BB
TCP FairnessTCP Fairness
Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth
TCP connection 1
bottleneckrouter
capacity RR
TCP connection 2
Go to Summary of TCP Congestion Control
Transport Layer ndash TCP
52
BB
Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link
Link with transmission rate of RR
Each connection have the same MSSMSS RTTRTT
No other TCP connections or UDP datagrams traverse the shared link
Ignore slow start phase of TCP
Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)
Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing
AssumptionsAssumptions
Transport Layer ndash TCP
53
BB
Why is TCP fairWhy is TCP fair
Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput
proportionally
RR
equal bandwidth shareequal bandwidth share
Connection 1 throughput
Connect
ion 2
thro
ughput
congestion avoidance additive increaseloss decrease window by factor of 2
RR
A point on the graph depicts the amount of link bandwidth
jointly jointly consumedconsumed by
the connections
Full bandwidth utilisation line
We can viewWe can view a simulation a simulation
on thison this
View SimulationView Simulation
Transport Layer ndash TCP
54
BB
The End
The next succeeding slides are just for additional reading
Transport Layer ndash TCP
55
BB
TCP Latency ModelingTCP Latency Modeling
In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs
Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth
Multiple End Systems sharing a link
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
Loop holes in TCPLoop holes in TCP
1 TCP connection
1 TCP connection
1 TCP connection
3 TCP connections
Multithreading implementationMultithreading implementation
Transport Layer ndash TCP
56
BB
TCP latency modelingTCP latency modeling
TCP connection establishment time data transfer delay Actual data transmission time
Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent
QQ How long How long does it take to does it take to
receive an object receive an object from a Web from a Web
serverserver
No data transfer delayNo data transfer delay
Therersquos data transfer delayTherersquos data transfer delay
the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety
Transport Layer ndash TCP
57
BB
TCP Latency ModelingTCP Latency Modeling
Network is uncongested with one link between end systems of rate RR
CongWinCongWin (fixed) determines the amount of data that can be sent
No packet loss no packet corruption no retransmissions required
Header overheads are negligible
File to sendFile to send = integer number of segments of size MSS
Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times
CLIENT SERVER
FILEFILE
Initial ThresholdThreshold of TCP congestion mechanism is very big
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits
SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)
Transport Layer ndash TCP
58
BB
TCP latency ModelingTCP latency Modeling
Case 1 latency = 2RTT + OR
K = Number of Windows of data that cover the object
K = OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
Number of segmentsRounded up to the nearest integer
Assume W=4 segmentsAssume W=4 segments
eg O=256bits S=32bits W=4
Transport Layer ndash TCP
59
BB
TCP latency ModelingTCP latency Modeling
Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]
Number of Windows of data that cover the objectK= OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR
Sender has to wait for an ACK after a windowrsquos worth of data sent
STALLED STALLED PERIODPERIOD
If there are k windows sender willbe stalled (k-1) times
Transport Layer ndash TCP
60
BB
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
STALLED STALLED PERIODPERIOD
4 windows4 windows
OS=15
Transport Layer ndash TCP
61
BB
bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the
object as followsobject as follows
S
OkK k 110 222min
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1log
1logmin
12min
2
2
S
OK
S
OkkK
S
OkK k
Note
Transport Layer ndash TCP
62
BB
bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window
bull Transmission of Transmission of kkth window window ==
bull Stall Time Stall Time ==
bull Latency Latency ==
12
k
R
S
12
k
R
SRTT
R
S
RTTR
S
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1K
1
122k
k
R
SRTT
R
S
R
ORTT
Transport Layer ndash TCP
63
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
Transport Layer ndash TCP
64
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
R
S
R
SRTTP
R
ORTT P )12(2Latency
bull Closed-form expression for the latency
Transport Layer ndash TCP
65
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
2
1
RTTRO
P
encyMinimumLat
Latency
Slow start will not significantly increase latency if RTT ltlt OR
Transport Layer ndash TCP
66
BB
httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features
Transport Layer ndash TCP
4
BB
Recall
Reliable Data Transfer Mechanisms
Checksum
Timer
Sequence number
ACK NAK
Window pipelining
socketdoor
TCPsend buffer
TCPreceive buffer
socketdoor
Packet -gt
applicationwrites data
applicationreads data
- Verification of integrity of packet
- Signals necessary re-transmission is required
- Keeps track of which packet has been sent and received
- Indicates receipt of packet in good or bad form
- Allows for the sending of multiple yet-to-be-acknowledged packets
Transport Layer ndash TCP
5
BB
Internet Checksum Example Note
When adding numbers a carryout from the most significant bit needs to be added to the result
Example add two 16-bit integers
1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1
wraparound
sumchecksum
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1To check
data
1
Transport Layer ndash TCP
6
BB
Connection Oriented Transport Connection Oriented Transport TCPTCP
TCP Segment Structure SEQ and ACK numbers Calculating the Timeout Interval The Simplified TCP Sender ACK Generation Recommendation (RFC 1122 RFC 2581)
Interesting Transmission Scenarios Flow Control TCP Connection Management
Transport Layer ndash TCP
7
BB
TCP segment structureTCP segment structure
source port dest port
32 bits
applicationdata
(variable length)
sequence number
acknowledgement numberrcvr window size
URGent data ptrchecksum
FSRPAUheadlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection established
(setup tear downcommands)
bytes thercvr is willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
In practice PSH URG and the Urgent Data Pointer are not used
HeaderHeader
We can view these teeny-weenydetails using Ethereal
Transport Layer ndash TCP
8
BB
ExampleExample
Suppose that a process in Host A wants to send a stream of data to a process in Host B over a TCP connection
AssumeData stream file consisting of 500000 bytes500000 bytesMSS 1000 bytes1000 bytesFirst byte of data stream numbered as 00
TCP constructs 500 segments out of the data stream
500000 bytes1000 bytes = 500 segments
Transport Layer ndash TCP
9
BB
TCP sequence s and ACKs
Sequence Numbers (s) byte stream number of first byte in segments data Do not necessarily start from 0 use random initial number
RRbull Segment 1 0 + RRbull Segment 2 1000 + RR etc
ACKs (acknowledgment) Seq of next byte expected from other side (last byte +1) Cumulative ACK If received segment 1 waits for segment 2 Eg Ack=1000 + R (received up to 999th byte)
00 1 2 3 4 999 10001000 1001 10021999
Segment 1 Segment 2
Transport Layer ndash TCP
10
BB
TCP sequence s and ACKs
Q how receiver handles out-of-order segments A TCP specs does does
notnot say - decide when implementing
Host A Host BSeq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
Usertypes
C
host ACKsreceipt
of echoedC
host ACKsreceipt of
C echoesback C
time
simple telnet scenario (with echo on)
Assuming that the starting sequence numbers for Host A
and Host B are 4242 and 7979 respectively
Send me the bytes from 43 onwardACK is being piggy-backed onserver-to-client data
Irsquom sending data starting at seq num=42
serverclient
Transport Layer ndash TCP
11
BB
Yet another server echo Yet another server echo exampleexample
Host A Host B
Seq=42 ACK=79 data = lsquoHellorsquo
Seq=79 ACK=47 data = lsquoHellorsquo
Seq=47 ACK=84 data = lsquo200rsquo
UsertypesHello
host ACKsreceipt
of echoedHello
send something else
host ACKsreceipt ofHello
echoes back Hello
time
Host Aseq=42ack=79
seq=47ack=84
Host B
seq=79
ack=47
seq=84
ack=50
Seq=84 ACK=50 data = lsquo200rsquo
ACK tells about up to what byte ACK tells about up to what byte has been receivedhas been received and what is the and what is the
next startingnext starting byte the host is byte the host is expecting to receiveexpecting to receive
Transport Layer ndash TCP
12
BB
TCP Round Trip Time and Timeout
Q how to set TCP how to set TCP timeout valuetimeout value
longer than RTT note RTT will vary
too short premature timeout unnecessary
retransmissions too long slow
reaction to segment loss
RTT = round trip time
Q how to estimate RTThow to estimate RTT SampleRTT measured time
from segment transmission until ACK receipt ignore retransmissions
cumulatively ACKed segments
SampleRTT will vary we would want estimated RTT to be smoother use several recent use several recent
measurementsmeasurements not just current SampleRTT
Main IssueMain Issue How long is the sender willing to wait before re-transmitting the packet
Transport Layer ndash TCP
13
BB
TCP Round Trip Time and Timeout
EstimatedRTT = (1-x) EstimatedRTT + x SampleRTTEstimatedRTT = (1-x) EstimatedRTT + x SampleRTT
Exponential weighted moving average influence of given sample decreases exponentially
fast typical value of x 0125 (RFC 2988)
Setting the timeoutSetting the timeout EstimatedRTT plus safety marginsafety margin large variation in EstimatedRTT -gt larger safety margin recommended value of x 025x 025
Timeout = EstimatedRTT + (4 Deviation)Timeout = EstimatedRTT + (4 Deviation)
DeviationDeviation = (1-x) Deviation + x |SampleRTT-EstimatedRTT|
Transport Layer ndash TCP
14
BB
EstimatedRTT = 0875 EstimatedRTT + 0125 SampleRTT
EstimatedRTT after the receipt of the ACK of segment 1EstimatedRTT = RTT for Segment 1 = 002746 second
EstimatedRTT after the receipt of the ACK of segment 2EstimatedRTT = 0875 002746 + 0125 0035557 = 00285
EstimatedRTT after the receipt of the ACK of segment 3EstimatedRTT = 0875 00285 + 0125 0070059 = 00337
EstimatedRTT after the receipt of the ACK of segment 4EstimatedRTT = 0875 00337+ 0125 011443 = 00438
EstimatedRTT after the receipt of the ACK of segment 5EstimatedRTT = 0875 00438 + 0125 013989 = 00558
EstimatedRTT after the receipt of the ACK of segment 6EstimatedRTT = 0875 00558 + 0125 018964 = 00725
Sample CalculationsSample Calculations
Transport Layer ndash TCP
15
BB
RTT Samples and RTT estimatesRTT Samples and RTT estimates
300
250
200
150
100 time
Estimated RTT
Sample RTT
RT
T (
mse
c)
The variations in the The variations in the SampleRTT SampleRTT are are smoothed out in the computation of thesmoothed out in the computation of the
EstimatedRTTEstimatedRTT
Transport Layer ndash TCP
16
BB
An Actual RTT estimationAn Actual RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RT
T (m
illis
eco
nds)
SampleRTT Estimated RTT
Transport Layer ndash TCP
17
BB
Simplified TCP sender assuming
waitfor
event
waitfor
event
event data received from application above
event timer timeout for segment with seq number y
event ACK receivedwith ACK number y
create send segment
retransmit segment
process ACK
- one way data transfer- no flow congestion control
FSM of TCP for Reliable Data Transfer FSM of TCP for Reliable Data Transfer
Transport Layer ndash TCP
18
BB
00 sendbase = initial_sequence number 01 nextseqnum = initial_sequence number 0203 loop (forever) 04 switch(event) 05 event data received from application above 06 create TCP segment with sequence number nextseqnum 07 If (timer is currently not running) start timer for segment
nextseqnum 08 pass segment to IP 09 nextseqnum = nextseqnum + length(data) 10 event timer timeout 11 retransmit not-yet-ACKed segment with smallest Seq 12 Start timer13 event ACK received with ACK field value of y 15 if (y gt sendbase) cumulative ACK of all data up to y 16 sendbase = y 17 If (there are currently any not-yet-ACKed segments)18 start timer19 20 end of loop forever
Associated with the oldest unACKed
segment
SIMPLIFIED TCPSIMPLIFIED TCP SENDER
AssumptionsAssumptions bull sender is not constrained by TCP flow or congestion controlbull that data from above is less than MSS in sizebull that data transfer is in one direction only
Transport Layer ndash TCP
20
BB
00 sendbase = initial_sequence number 01 nextseqnum = initial_sequence number 0203 loop (forever) 04 switch(event) 05 event data received from application above 06 create TCP segment with sequence number nextseqnum 07 start timer for segment nextseqnum 08 pass segment to IP 09 nextseqnum = nextseqnum + length(data) 10 event timer timeout for segment with sequence number y 11 retransmit segment with sequence number y 12 compute new timeout interval for segment y 13 restart timer for sequence number y 14 event ACK received with ACK field value of y 15 if (y gt sendbase) cumulative ACK of all data up to y 16 cancel all timers for segments with sequence numbers lt y 17 sendbase = y 18 19 else a duplicate ACK for already ACKed segment 20 increment number of duplicate ACKs received for y 21 if (number of duplicate ACKS received for y is 3) 22 perform TCP fast retransmission 23 resend segment with sequence number y 24 restart timer for segment y 25 26 end of loop forever
TCPTCP with MODIFICATIONSwith MODIFICATIONS SENDER
Why wait for the timeout to expire when consecutive
ACKs can be used to indicate a lost segment
With Fast Retransmit
Transport Layer ndash TCP
21
BB
TCP ACK generation [RFC 1122 RFC 2581]
Event
in-order segment arrival no gapseverything else already ACKed
in-order segment arrival no gaps one delayed ACK pending (due to action 1)
out-of-order segment arrivalwith higher than expect seq - a gap is detected
arrival of segment that partially or completely fills gap
TCP Receiver action
Delay sending the ACK Wait up to 500msfor next segment If next segment does not arrive in this interval send ACK
immediately send a singlecumulative ACK
send duplicate ACK indicating seq of next expected byte
Immediately send an ACK if segment startsat lower end of gap
1
2
3
4
Receiver does not discard out-of-order segmentsReceiver does not discard out-of-order segments
Transport Layer ndash TCP
22
BB
TCP Interesting Scenarios
Host A
Seq=92 8 bytes data
ACK=100
loss
tim
eout
time lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
Host A
Seq=100 20 bytes data
ACK=100
Seq=
92
tim
eout
time premature timeoutcumulative ACKs
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
ACK=120
Retransmission due to lost ACKSegment with Seq=100 not retransmitted
Timer is restarted here for Seq=92
Simplified TCP versionSimplified TCP version
Transport Layer ndash TCP
23
BB
Host A
Seq=100 20 bytes data
ACK=100Seq=
92
tim
eout
time
Host B
ACK=120
Seq=92 8 bytes data
Xloss
Cumulative ACK avoids retransmission of the first segment
TCP Retransmission ScenarioTCP Retransmission Scenario
Transport Layer ndash TCP
24
BB
TCP Modifications Doubling the Timeout IntervalDoubling the Timeout Interval
Provides a limited form of congestion control
Timer expiration is more likely caused by congestion in the network
TimeoutInterval = 2 TimeoutIntervalPrevious
TCP acts more politely by increasing the TimeoutInterval causing the sender to retransmit after longer and longer intervals
Congestion may get worse if sources
continue to retransmit packets
persistently
After ACK is received After ACK is received TimeoutInterval is derived TimeoutInterval is derived from most recent EstimatedRTT from most recent EstimatedRTT and DevRTTand DevRTT
Others check RFC 2018 ndash selective ACK
Transport Layer ndash TCP
25
BB
receiverreceiver explicitly informs sender of (dynamically changing) amount of free buffer space RcvWindowRcvWindow field
in TCP segment
sendersender keeps the amount of transmitted unACKed data less thanless than most recently received RcvWindowRcvWindow
sender wont overrun
receivers buffer bytransmitting too
much too fast
flow controlflow control
receiver buffering
RcvBufferRcvBuffer = size of TCP Receive Buffer
RcvWindowRcvWindow = amount of spare room in Buffer
TCP Flow ControlTCP Flow Control
Transport Layer ndash TCP
26
BB
FLOW CONTROL FLOW CONTROL ReceiverReceiver
04060 50100
LastByteRcvd
EXAMPLE HOST A sends a large file to HOST B
LastByteRead
Application Process
Data from IP
RcvBuffer
RcvWindow=RcvBuffer-[LastByteRcvd-LastByteRead]
RECEIVER HOST B ndash uses RcvWindow LastByteRcvd LastByteReadRcvWindow LastByteRcvd LastByteRead
Initially RcvWindow = RcvBuffer Application reads from the bufferHOST BHOST B tells HOST AHOST A how much spare roomhow much spare room it has in the connection buffer by placing its current value of RcvWindowRcvWindow in the receive window field of every segment it sends to HOST AHOST A
Transport Layer ndash TCP
27
BB
FLOW CONTROL SenderSender
04060 50100
LastByteSent
EXAMPLE HOST A sends a large file to HOST B
LastByteACKed
Data
To ensure that HOST B does not overflow HOST A maintains throughout the connectionrsquos life that [LastByteSent-LastByteACKed] lt= RcvWindow
SENDER HOST A
ACKs ACKs from from Host Host BB
SENDER HOST A ndash uses RcvWindow RcvWindow of HostBof HostB LastByteSent LastByteACKed LastByteSent LastByteACKed
Transport Layer ndash TCP
28
BB
FLOW CONTROLFLOW CONTROLSome issue to consider
RcvWindowRcvWindow ndash used by the connection to provide the flow control serviceflow control service
TCPTCP requires that requires that HOST AHOST A continue to sendcontinue to send segments with one segments with one data byte when HOST Brsquos receive data byte when HOST Brsquos receive window is window is 00 Such segments will be Such segments will be ACKedACKed by HOST B by HOST B Eventually the buffer will have Eventually the buffer will have some space and the some space and the ACKsACKs will will contain contain RcvWindow gt 0RcvWindow gt 0
What happens when the receive
buffer of HOST B is full full (that is when (that is when
RcvWindow=0)RcvWindow=0)
TCP sends a segment only when there is datadata or ACKACK to send Therefore the sender must maintain the connection lsquoalivealiversquo
Transport Layer ndash TCP
29
BB
TCP Connection ManagementTCP Connection Management
Recall TCP sender receiver establish ldquoconnectionconnectionrdquo before exchanging data segments
Initialize TCP variables sequence numbers buffers flow control info (eg RcvWindow)
ClientClient is the connection initiator
In Java Socket clientSocket = new Socket(hostnameport number) connect ServerServer is contacted by client
In JavaSocket accept()
if (connectconnect(s (struct sockaddr )ampsin sizeof(sin)) = 0) printf(connect failedn) WSACleanup() exit(1)
ns = acceptaccept(s(struct sockaddr )(ampremoteaddr)ampaddrlen)
Transport Layer ndash TCP
30
BB
TCP Connection Management
Three way handshakeStep 1 clientclient end system sends TCP
SYNSYN control segment to server (executed by TCP itself)
specifies initial seq number (isn)
Step 2 serverserver end system receives SYNSYN replies with SYNACKSYNACK control segment
ACKs received SYN allocates buffers specifies serverrsquos initial seq
number
Step 3 clientclient ACKsACKs the connection with
ACK=server_isn +1ACK=server_isn +1
allocates buffers sends SYN=0SYN=0
Connection established
Client
AcceptAccept (SYN=1
seq=server_isnack=client_isn+1)
time
Server
ConnectConnect (SYN=1 seq=client_isn)
ACK (SYN=0
seq=client_isn+1ack=server_isn+1)
Establishing a connection
This is what happens when we create a socket for
connection to a server
After establishing the connection the client can receive segments with app-generated data (SYN=0)(SYN=0)
Transport Layer ndash TCP
31
BB
TCP Connection Management (cont)
Closing a connection
client closes socket
closesocket(s)closesocket(s)
Java clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
How TCP connection is established and torn down
Transport Layer ndash TCP
32
BB
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
Enters timed wait - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer ndash TCP
33
BB
TCP Connection Management (cont)
TCP client lifecycle
TCP server lifecycle
Used in case ACK gets lost It is implementation-dependent (eg 30
seconds 1 minute 2 minutes
Connection formally closes ndash all resources (eg port numbers) are
released
1
2
3
4
5
6
7
8
9
10
11
12
Transport Layer ndash TCP
34
BB
End of Flow Control and Error Control
Transport Layer ndash TCP
35
BB
Flow Control vs Congestion ControlFlow Control vs Congestion ControlSimilar actions are taken but for very different reasons
Flow ControlFlow Controlbull point-to-point traffic between sender and receiverbull speed matching service matching the rate at which the sender is sending against the rate at which the receiving application is readingbull prevents Receiver Buffer from overflowing
CongestionCongestion ndash happens when there are too many sources attempting to send data at too high a rate for the routers along the path
Congestion ControlCongestion Controlbull service that makes sure that the routers between End Systems are able to carry the offered trafficbull prevents routers from overflowing
Same course of action Throttling of the sender
Transport Layer ndash TCP
36
BB
CongestionCongestion Informally too many sources sending too
much data too fast for network to handle different from flow control Manifestations
lost packets (buffer overflow at routers) long delays (queuing in router buffers)
a top-10 problem
Principles of Congestion ControlPrinciples of Congestion Control
Transport Layer ndash TCP
37
BB
Approaches towards congestion control
End-to-end congestion control
no explicit feedback from network
congestion inferred by end-systems from observed packet loss amp delay
approach taken by TCP
Network-assisted congestion control
routers provide feedback to End Systems in the form of single bit indicating
link congestion (SNA DECbit TCPIP ECN ATM ABR)
explicit transmission rate the sender should send at
1 2
Two broad approaches towards congestion control
Transport Layer ndash TCP
38
BB
TCP Congestion ControlTCP Congestion ControlHow TCP sender limits the rate at which it sends traffic into its connection
SENDER
(Amount of unACKed data)SENDER lt min(CongWin RcvWindow)
LastByteSent - LastByteACKed
Indirectly limits the senderrsquos send rateAssumptions
bull TCP receive buffer is very large ndash no RcvWindow constraint Amt of unACKed data at sender is solely limited by CongWin
bull Packet loss delay amp packet transmission delay are negligible
Sending rate (approx) CongWinRTT
By adjusting CongWin sender can therefore adjust the
rate at which it sends data into its connection
New variable ndash Congestion
Window
Transport Layer ndash TCP
39
BB
TCP Congestion ControlTCP Congestion ControlTCP uses ACKs to trigger (ldquoclockrdquo) its increase in congestion window size ndash ldquoself-clockingrdquo
Arrival of ACKs ndash indication to the sender that all is well
1 Slow Rate
bull Congestion window will be increased at a relatively slow rate
2 High rate
bull Congestion window will be increased more quickly
Transport Layer ndash TCP
40
BB
TCP Congestion ControlTCP Congestion ControlHow TCP perceives that there is congestion on the path
ldquoLoss Eventrdquo ndash when there is excessive congestion router buffers along the path overflows causing datagrams to be dropped which in turn results in a ldquoloss eventrdquo at the sender
1 Timeout
bull no ACK is received after segment loss
2 Receipt of three duplicate ACKs
bull segment loss is followed by three ACKs received at the sender
Transport Layer ndash TCP
41
BB
TCP Congestion Control details
sender limits transmission LastByteSent-LastByteAcked
cwnd
roughly
cwnd is dynamic function of perceived network congestion
How does sender perceive congestion
loss event = timeout or 3 duplicate acks
TCP sender reduces rate (cwnd) after loss event
Three mechanisms1 AIMD
2 slow start
3 conservative after timeout events
rate = cwnd
RTT Bytessec
Transport Layer ndash TCP
42
BB
TCP congestion avoidance congestion avoidance additive increase multiplicative decreaseadditive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every RTT until
loss is detected multiplicative decrease cut cwnd in half after loss
timecwnd
co
nge
stio
n w
indo
w s
ize
saw toothbehavior probingfor bandwidth
Transport Layer ndash TCP
43
BB
TCP Slow Start Slow Start when connection begins
increase rate exponentially until first loss event initially cwndcwnd = 1 MSS double cwndcwnd every RTT done by incrementing cwndcwnd by 1 MSS by 1 MSS for every ACK received
summary initial rate is slow but ramps up exponentially fast (doubling of the sending rate every RTT)
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transport Layer ndash TCP
44
BB
Refinement inferring lossRefinement inferring loss
after 3 dup ACKsafter 3 dup ACKs cwndcwnd is cut in halfhalf window then grows
linearlylinearly butbut after timeout after timeout
eventevent cwndcwnd is is set to 1 MSS1 MSS window then grows
exponentiallyexponentially Up to a thresholdUp to a threshold
then grows linearlylinearly
3 dup ACKs 3 dup ACKs indicates network capable of delivering some segments
timeouttimeout indicates a ldquomore alarmingrdquo congestion scenario
Philosophy
Transport Layer ndash TCP
45
BB
Refinement
Q when should the exponential increase switch to linear
A when cwndcwnd gets to 1212 of its value before timeout
Implementation variable ssthresh (slow-start threshold)ssthresh (slow-start threshold) on loss eventon loss event ssthreshssthresh is set to 1212 of cwndcwnd just
before loss event
Transport Layer ndash TCP
46
BB
TCP Sender Congestion ControlSTATE EVENT TCP SENDER Congestion-
Control ActionCommentary
SLOW START (SS)
ACK receipt for previously unACKed data
CongWin = CongWin + MSSIf(CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo
Resulting in a doubling of CongWin every RTT
Congestion Avoidance (CA)
ACK receipt for previously unACKed data
CongWin = CongWin + MSS (MSSCongWin)
Additive increase resulting in increasing of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate ACK
Threshold = CongWin 2CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
SS or CA Timeout Threshold = CongWin 2CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter Slow Start
SS or CA Duplicate ACK Increment duplicate ACK count for segment being ACKed
CongWin and Threshold not changed
Transport Layer ndash TCP
47
BB
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3 MSSretransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++
duplicate ACK
cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
Transport Layer ndash TCP
48
BB
Congestion control
TCPrsquos Congestion Control Service
CLIENTSERVER
Problem Gridlock sets-in when there is packet loss due to router congestion
forces the End Systems to decrease the rate at which packets are sentdecrease the rate at which packets are sent during periods of congestion
The sending systemrsquos packet is lost due to congestion and is alerted when it stops receiving ACKs of packets sent
Transport Layer ndash TCP
49
BBTransport Layer 3-49
Macroscopic Description of TCP throughput
whatrsquos the average throughout of TCP as a function of window size and RTT ignore slow start (typically very short phases)
let W be the window size when loss occurs when window is W throughput is WRTT just after loss window drops to W2 throughput to
W2RTT Throughput increases linearly (by MSSRTT every
RTT) Average Throughput 75 WRTT
(Based on Idealised model for the steady-state dynamics of TCP)
Transport Layer 3-50
TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired
throughput of 10 Gbps requires window size W = 83333 in-flight
segments Throughput in terms of loss rate
L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)
new versions of TCP is needed for high-speed environments
LRTT
MSS221
Transport Layer ndash TCP
51
BB
TCP FairnessTCP Fairness
Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth
TCP connection 1
bottleneckrouter
capacity RR
TCP connection 2
Go to Summary of TCP Congestion Control
Transport Layer ndash TCP
52
BB
Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link
Link with transmission rate of RR
Each connection have the same MSSMSS RTTRTT
No other TCP connections or UDP datagrams traverse the shared link
Ignore slow start phase of TCP
Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)
Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing
AssumptionsAssumptions
Transport Layer ndash TCP
53
BB
Why is TCP fairWhy is TCP fair
Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput
proportionally
RR
equal bandwidth shareequal bandwidth share
Connection 1 throughput
Connect
ion 2
thro
ughput
congestion avoidance additive increaseloss decrease window by factor of 2
RR
A point on the graph depicts the amount of link bandwidth
jointly jointly consumedconsumed by
the connections
Full bandwidth utilisation line
We can viewWe can view a simulation a simulation
on thison this
View SimulationView Simulation
Transport Layer ndash TCP
54
BB
The End
The next succeeding slides are just for additional reading
Transport Layer ndash TCP
55
BB
TCP Latency ModelingTCP Latency Modeling
In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs
Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth
Multiple End Systems sharing a link
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
Loop holes in TCPLoop holes in TCP
1 TCP connection
1 TCP connection
1 TCP connection
3 TCP connections
Multithreading implementationMultithreading implementation
Transport Layer ndash TCP
56
BB
TCP latency modelingTCP latency modeling
TCP connection establishment time data transfer delay Actual data transmission time
Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent
QQ How long How long does it take to does it take to
receive an object receive an object from a Web from a Web
serverserver
No data transfer delayNo data transfer delay
Therersquos data transfer delayTherersquos data transfer delay
the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety
Transport Layer ndash TCP
57
BB
TCP Latency ModelingTCP Latency Modeling
Network is uncongested with one link between end systems of rate RR
CongWinCongWin (fixed) determines the amount of data that can be sent
No packet loss no packet corruption no retransmissions required
Header overheads are negligible
File to sendFile to send = integer number of segments of size MSS
Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times
CLIENT SERVER
FILEFILE
Initial ThresholdThreshold of TCP congestion mechanism is very big
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits
SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)
Transport Layer ndash TCP
58
BB
TCP latency ModelingTCP latency Modeling
Case 1 latency = 2RTT + OR
K = Number of Windows of data that cover the object
K = OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
Number of segmentsRounded up to the nearest integer
Assume W=4 segmentsAssume W=4 segments
eg O=256bits S=32bits W=4
Transport Layer ndash TCP
59
BB
TCP latency ModelingTCP latency Modeling
Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]
Number of Windows of data that cover the objectK= OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR
Sender has to wait for an ACK after a windowrsquos worth of data sent
STALLED STALLED PERIODPERIOD
If there are k windows sender willbe stalled (k-1) times
Transport Layer ndash TCP
60
BB
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
STALLED STALLED PERIODPERIOD
4 windows4 windows
OS=15
Transport Layer ndash TCP
61
BB
bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the
object as followsobject as follows
S
OkK k 110 222min
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1log
1logmin
12min
2
2
S
OK
S
OkkK
S
OkK k
Note
Transport Layer ndash TCP
62
BB
bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window
bull Transmission of Transmission of kkth window window ==
bull Stall Time Stall Time ==
bull Latency Latency ==
12
k
R
S
12
k
R
SRTT
R
S
RTTR
S
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1K
1
122k
k
R
SRTT
R
S
R
ORTT
Transport Layer ndash TCP
63
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
Transport Layer ndash TCP
64
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
R
S
R
SRTTP
R
ORTT P )12(2Latency
bull Closed-form expression for the latency
Transport Layer ndash TCP
65
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
2
1
RTTRO
P
encyMinimumLat
Latency
Slow start will not significantly increase latency if RTT ltlt OR
Transport Layer ndash TCP
66
BB
httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features
Transport Layer ndash TCP
5
BB
Internet Checksum Example Note
When adding numbers a carryout from the most significant bit needs to be added to the result
Example add two 16-bit integers
1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1
wraparound
sumchecksum
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1To check
data
1
Transport Layer ndash TCP
6
BB
Connection Oriented Transport Connection Oriented Transport TCPTCP
TCP Segment Structure SEQ and ACK numbers Calculating the Timeout Interval The Simplified TCP Sender ACK Generation Recommendation (RFC 1122 RFC 2581)
Interesting Transmission Scenarios Flow Control TCP Connection Management
Transport Layer ndash TCP
7
BB
TCP segment structureTCP segment structure
source port dest port
32 bits
applicationdata
(variable length)
sequence number
acknowledgement numberrcvr window size
URGent data ptrchecksum
FSRPAUheadlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection established
(setup tear downcommands)
bytes thercvr is willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
In practice PSH URG and the Urgent Data Pointer are not used
HeaderHeader
We can view these teeny-weenydetails using Ethereal
Transport Layer ndash TCP
8
BB
ExampleExample
Suppose that a process in Host A wants to send a stream of data to a process in Host B over a TCP connection
AssumeData stream file consisting of 500000 bytes500000 bytesMSS 1000 bytes1000 bytesFirst byte of data stream numbered as 00
TCP constructs 500 segments out of the data stream
500000 bytes1000 bytes = 500 segments
Transport Layer ndash TCP
9
BB
TCP sequence s and ACKs
Sequence Numbers (s) byte stream number of first byte in segments data Do not necessarily start from 0 use random initial number
RRbull Segment 1 0 + RRbull Segment 2 1000 + RR etc
ACKs (acknowledgment) Seq of next byte expected from other side (last byte +1) Cumulative ACK If received segment 1 waits for segment 2 Eg Ack=1000 + R (received up to 999th byte)
00 1 2 3 4 999 10001000 1001 10021999
Segment 1 Segment 2
Transport Layer ndash TCP
10
BB
TCP sequence s and ACKs
Q how receiver handles out-of-order segments A TCP specs does does
notnot say - decide when implementing
Host A Host BSeq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
Usertypes
C
host ACKsreceipt
of echoedC
host ACKsreceipt of
C echoesback C
time
simple telnet scenario (with echo on)
Assuming that the starting sequence numbers for Host A
and Host B are 4242 and 7979 respectively
Send me the bytes from 43 onwardACK is being piggy-backed onserver-to-client data
Irsquom sending data starting at seq num=42
serverclient
Transport Layer ndash TCP
11
BB
Yet another server echo Yet another server echo exampleexample
Host A Host B
Seq=42 ACK=79 data = lsquoHellorsquo
Seq=79 ACK=47 data = lsquoHellorsquo
Seq=47 ACK=84 data = lsquo200rsquo
UsertypesHello
host ACKsreceipt
of echoedHello
send something else
host ACKsreceipt ofHello
echoes back Hello
time
Host Aseq=42ack=79
seq=47ack=84
Host B
seq=79
ack=47
seq=84
ack=50
Seq=84 ACK=50 data = lsquo200rsquo
ACK tells about up to what byte ACK tells about up to what byte has been receivedhas been received and what is the and what is the
next startingnext starting byte the host is byte the host is expecting to receiveexpecting to receive
Transport Layer ndash TCP
12
BB
TCP Round Trip Time and Timeout
Q how to set TCP how to set TCP timeout valuetimeout value
longer than RTT note RTT will vary
too short premature timeout unnecessary
retransmissions too long slow
reaction to segment loss
RTT = round trip time
Q how to estimate RTThow to estimate RTT SampleRTT measured time
from segment transmission until ACK receipt ignore retransmissions
cumulatively ACKed segments
SampleRTT will vary we would want estimated RTT to be smoother use several recent use several recent
measurementsmeasurements not just current SampleRTT
Main IssueMain Issue How long is the sender willing to wait before re-transmitting the packet
Transport Layer ndash TCP
13
BB
TCP Round Trip Time and Timeout
EstimatedRTT = (1-x) EstimatedRTT + x SampleRTTEstimatedRTT = (1-x) EstimatedRTT + x SampleRTT
Exponential weighted moving average influence of given sample decreases exponentially
fast typical value of x 0125 (RFC 2988)
Setting the timeoutSetting the timeout EstimatedRTT plus safety marginsafety margin large variation in EstimatedRTT -gt larger safety margin recommended value of x 025x 025
Timeout = EstimatedRTT + (4 Deviation)Timeout = EstimatedRTT + (4 Deviation)
DeviationDeviation = (1-x) Deviation + x |SampleRTT-EstimatedRTT|
Transport Layer ndash TCP
14
BB
EstimatedRTT = 0875 EstimatedRTT + 0125 SampleRTT
EstimatedRTT after the receipt of the ACK of segment 1EstimatedRTT = RTT for Segment 1 = 002746 second
EstimatedRTT after the receipt of the ACK of segment 2EstimatedRTT = 0875 002746 + 0125 0035557 = 00285
EstimatedRTT after the receipt of the ACK of segment 3EstimatedRTT = 0875 00285 + 0125 0070059 = 00337
EstimatedRTT after the receipt of the ACK of segment 4EstimatedRTT = 0875 00337+ 0125 011443 = 00438
EstimatedRTT after the receipt of the ACK of segment 5EstimatedRTT = 0875 00438 + 0125 013989 = 00558
EstimatedRTT after the receipt of the ACK of segment 6EstimatedRTT = 0875 00558 + 0125 018964 = 00725
Sample CalculationsSample Calculations
Transport Layer ndash TCP
15
BB
RTT Samples and RTT estimatesRTT Samples and RTT estimates
300
250
200
150
100 time
Estimated RTT
Sample RTT
RT
T (
mse
c)
The variations in the The variations in the SampleRTT SampleRTT are are smoothed out in the computation of thesmoothed out in the computation of the
EstimatedRTTEstimatedRTT
Transport Layer ndash TCP
16
BB
An Actual RTT estimationAn Actual RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RT
T (m
illis
eco
nds)
SampleRTT Estimated RTT
Transport Layer ndash TCP
17
BB
Simplified TCP sender assuming
waitfor
event
waitfor
event
event data received from application above
event timer timeout for segment with seq number y
event ACK receivedwith ACK number y
create send segment
retransmit segment
process ACK
- one way data transfer- no flow congestion control
FSM of TCP for Reliable Data Transfer FSM of TCP for Reliable Data Transfer
Transport Layer ndash TCP
18
BB
00 sendbase = initial_sequence number 01 nextseqnum = initial_sequence number 0203 loop (forever) 04 switch(event) 05 event data received from application above 06 create TCP segment with sequence number nextseqnum 07 If (timer is currently not running) start timer for segment
nextseqnum 08 pass segment to IP 09 nextseqnum = nextseqnum + length(data) 10 event timer timeout 11 retransmit not-yet-ACKed segment with smallest Seq 12 Start timer13 event ACK received with ACK field value of y 15 if (y gt sendbase) cumulative ACK of all data up to y 16 sendbase = y 17 If (there are currently any not-yet-ACKed segments)18 start timer19 20 end of loop forever
Associated with the oldest unACKed
segment
SIMPLIFIED TCPSIMPLIFIED TCP SENDER
AssumptionsAssumptions bull sender is not constrained by TCP flow or congestion controlbull that data from above is less than MSS in sizebull that data transfer is in one direction only
Transport Layer ndash TCP
20
BB
00 sendbase = initial_sequence number 01 nextseqnum = initial_sequence number 0203 loop (forever) 04 switch(event) 05 event data received from application above 06 create TCP segment with sequence number nextseqnum 07 start timer for segment nextseqnum 08 pass segment to IP 09 nextseqnum = nextseqnum + length(data) 10 event timer timeout for segment with sequence number y 11 retransmit segment with sequence number y 12 compute new timeout interval for segment y 13 restart timer for sequence number y 14 event ACK received with ACK field value of y 15 if (y gt sendbase) cumulative ACK of all data up to y 16 cancel all timers for segments with sequence numbers lt y 17 sendbase = y 18 19 else a duplicate ACK for already ACKed segment 20 increment number of duplicate ACKs received for y 21 if (number of duplicate ACKS received for y is 3) 22 perform TCP fast retransmission 23 resend segment with sequence number y 24 restart timer for segment y 25 26 end of loop forever
TCPTCP with MODIFICATIONSwith MODIFICATIONS SENDER
Why wait for the timeout to expire when consecutive
ACKs can be used to indicate a lost segment
With Fast Retransmit
Transport Layer ndash TCP
21
BB
TCP ACK generation [RFC 1122 RFC 2581]
Event
in-order segment arrival no gapseverything else already ACKed
in-order segment arrival no gaps one delayed ACK pending (due to action 1)
out-of-order segment arrivalwith higher than expect seq - a gap is detected
arrival of segment that partially or completely fills gap
TCP Receiver action
Delay sending the ACK Wait up to 500msfor next segment If next segment does not arrive in this interval send ACK
immediately send a singlecumulative ACK
send duplicate ACK indicating seq of next expected byte
Immediately send an ACK if segment startsat lower end of gap
1
2
3
4
Receiver does not discard out-of-order segmentsReceiver does not discard out-of-order segments
Transport Layer ndash TCP
22
BB
TCP Interesting Scenarios
Host A
Seq=92 8 bytes data
ACK=100
loss
tim
eout
time lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
Host A
Seq=100 20 bytes data
ACK=100
Seq=
92
tim
eout
time premature timeoutcumulative ACKs
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
ACK=120
Retransmission due to lost ACKSegment with Seq=100 not retransmitted
Timer is restarted here for Seq=92
Simplified TCP versionSimplified TCP version
Transport Layer ndash TCP
23
BB
Host A
Seq=100 20 bytes data
ACK=100Seq=
92
tim
eout
time
Host B
ACK=120
Seq=92 8 bytes data
Xloss
Cumulative ACK avoids retransmission of the first segment
TCP Retransmission ScenarioTCP Retransmission Scenario
Transport Layer ndash TCP
24
BB
TCP Modifications Doubling the Timeout IntervalDoubling the Timeout Interval
Provides a limited form of congestion control
Timer expiration is more likely caused by congestion in the network
TimeoutInterval = 2 TimeoutIntervalPrevious
TCP acts more politely by increasing the TimeoutInterval causing the sender to retransmit after longer and longer intervals
Congestion may get worse if sources
continue to retransmit packets
persistently
After ACK is received After ACK is received TimeoutInterval is derived TimeoutInterval is derived from most recent EstimatedRTT from most recent EstimatedRTT and DevRTTand DevRTT
Others check RFC 2018 ndash selective ACK
Transport Layer ndash TCP
25
BB
receiverreceiver explicitly informs sender of (dynamically changing) amount of free buffer space RcvWindowRcvWindow field
in TCP segment
sendersender keeps the amount of transmitted unACKed data less thanless than most recently received RcvWindowRcvWindow
sender wont overrun
receivers buffer bytransmitting too
much too fast
flow controlflow control
receiver buffering
RcvBufferRcvBuffer = size of TCP Receive Buffer
RcvWindowRcvWindow = amount of spare room in Buffer
TCP Flow ControlTCP Flow Control
Transport Layer ndash TCP
26
BB
FLOW CONTROL FLOW CONTROL ReceiverReceiver
04060 50100
LastByteRcvd
EXAMPLE HOST A sends a large file to HOST B
LastByteRead
Application Process
Data from IP
RcvBuffer
RcvWindow=RcvBuffer-[LastByteRcvd-LastByteRead]
RECEIVER HOST B ndash uses RcvWindow LastByteRcvd LastByteReadRcvWindow LastByteRcvd LastByteRead
Initially RcvWindow = RcvBuffer Application reads from the bufferHOST BHOST B tells HOST AHOST A how much spare roomhow much spare room it has in the connection buffer by placing its current value of RcvWindowRcvWindow in the receive window field of every segment it sends to HOST AHOST A
Transport Layer ndash TCP
27
BB
FLOW CONTROL SenderSender
04060 50100
LastByteSent
EXAMPLE HOST A sends a large file to HOST B
LastByteACKed
Data
To ensure that HOST B does not overflow HOST A maintains throughout the connectionrsquos life that [LastByteSent-LastByteACKed] lt= RcvWindow
SENDER HOST A
ACKs ACKs from from Host Host BB
SENDER HOST A ndash uses RcvWindow RcvWindow of HostBof HostB LastByteSent LastByteACKed LastByteSent LastByteACKed
Transport Layer ndash TCP
28
BB
FLOW CONTROLFLOW CONTROLSome issue to consider
RcvWindowRcvWindow ndash used by the connection to provide the flow control serviceflow control service
TCPTCP requires that requires that HOST AHOST A continue to sendcontinue to send segments with one segments with one data byte when HOST Brsquos receive data byte when HOST Brsquos receive window is window is 00 Such segments will be Such segments will be ACKedACKed by HOST B by HOST B Eventually the buffer will have Eventually the buffer will have some space and the some space and the ACKsACKs will will contain contain RcvWindow gt 0RcvWindow gt 0
What happens when the receive
buffer of HOST B is full full (that is when (that is when
RcvWindow=0)RcvWindow=0)
TCP sends a segment only when there is datadata or ACKACK to send Therefore the sender must maintain the connection lsquoalivealiversquo
Transport Layer ndash TCP
29
BB
TCP Connection ManagementTCP Connection Management
Recall TCP sender receiver establish ldquoconnectionconnectionrdquo before exchanging data segments
Initialize TCP variables sequence numbers buffers flow control info (eg RcvWindow)
ClientClient is the connection initiator
In Java Socket clientSocket = new Socket(hostnameport number) connect ServerServer is contacted by client
In JavaSocket accept()
if (connectconnect(s (struct sockaddr )ampsin sizeof(sin)) = 0) printf(connect failedn) WSACleanup() exit(1)
ns = acceptaccept(s(struct sockaddr )(ampremoteaddr)ampaddrlen)
Transport Layer ndash TCP
30
BB
TCP Connection Management
Three way handshakeStep 1 clientclient end system sends TCP
SYNSYN control segment to server (executed by TCP itself)
specifies initial seq number (isn)
Step 2 serverserver end system receives SYNSYN replies with SYNACKSYNACK control segment
ACKs received SYN allocates buffers specifies serverrsquos initial seq
number
Step 3 clientclient ACKsACKs the connection with
ACK=server_isn +1ACK=server_isn +1
allocates buffers sends SYN=0SYN=0
Connection established
Client
AcceptAccept (SYN=1
seq=server_isnack=client_isn+1)
time
Server
ConnectConnect (SYN=1 seq=client_isn)
ACK (SYN=0
seq=client_isn+1ack=server_isn+1)
Establishing a connection
This is what happens when we create a socket for
connection to a server
After establishing the connection the client can receive segments with app-generated data (SYN=0)(SYN=0)
Transport Layer ndash TCP
31
BB
TCP Connection Management (cont)
Closing a connection
client closes socket
closesocket(s)closesocket(s)
Java clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
How TCP connection is established and torn down
Transport Layer ndash TCP
32
BB
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
Enters timed wait - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer ndash TCP
33
BB
TCP Connection Management (cont)
TCP client lifecycle
TCP server lifecycle
Used in case ACK gets lost It is implementation-dependent (eg 30
seconds 1 minute 2 minutes
Connection formally closes ndash all resources (eg port numbers) are
released
1
2
3
4
5
6
7
8
9
10
11
12
Transport Layer ndash TCP
34
BB
End of Flow Control and Error Control
Transport Layer ndash TCP
35
BB
Flow Control vs Congestion ControlFlow Control vs Congestion ControlSimilar actions are taken but for very different reasons
Flow ControlFlow Controlbull point-to-point traffic between sender and receiverbull speed matching service matching the rate at which the sender is sending against the rate at which the receiving application is readingbull prevents Receiver Buffer from overflowing
CongestionCongestion ndash happens when there are too many sources attempting to send data at too high a rate for the routers along the path
Congestion ControlCongestion Controlbull service that makes sure that the routers between End Systems are able to carry the offered trafficbull prevents routers from overflowing
Same course of action Throttling of the sender
Transport Layer ndash TCP
36
BB
CongestionCongestion Informally too many sources sending too
much data too fast for network to handle different from flow control Manifestations
lost packets (buffer overflow at routers) long delays (queuing in router buffers)
a top-10 problem
Principles of Congestion ControlPrinciples of Congestion Control
Transport Layer ndash TCP
37
BB
Approaches towards congestion control
End-to-end congestion control
no explicit feedback from network
congestion inferred by end-systems from observed packet loss amp delay
approach taken by TCP
Network-assisted congestion control
routers provide feedback to End Systems in the form of single bit indicating
link congestion (SNA DECbit TCPIP ECN ATM ABR)
explicit transmission rate the sender should send at
1 2
Two broad approaches towards congestion control
Transport Layer ndash TCP
38
BB
TCP Congestion ControlTCP Congestion ControlHow TCP sender limits the rate at which it sends traffic into its connection
SENDER
(Amount of unACKed data)SENDER lt min(CongWin RcvWindow)
LastByteSent - LastByteACKed
Indirectly limits the senderrsquos send rateAssumptions
bull TCP receive buffer is very large ndash no RcvWindow constraint Amt of unACKed data at sender is solely limited by CongWin
bull Packet loss delay amp packet transmission delay are negligible
Sending rate (approx) CongWinRTT
By adjusting CongWin sender can therefore adjust the
rate at which it sends data into its connection
New variable ndash Congestion
Window
Transport Layer ndash TCP
39
BB
TCP Congestion ControlTCP Congestion ControlTCP uses ACKs to trigger (ldquoclockrdquo) its increase in congestion window size ndash ldquoself-clockingrdquo
Arrival of ACKs ndash indication to the sender that all is well
1 Slow Rate
bull Congestion window will be increased at a relatively slow rate
2 High rate
bull Congestion window will be increased more quickly
Transport Layer ndash TCP
40
BB
TCP Congestion ControlTCP Congestion ControlHow TCP perceives that there is congestion on the path
ldquoLoss Eventrdquo ndash when there is excessive congestion router buffers along the path overflows causing datagrams to be dropped which in turn results in a ldquoloss eventrdquo at the sender
1 Timeout
bull no ACK is received after segment loss
2 Receipt of three duplicate ACKs
bull segment loss is followed by three ACKs received at the sender
Transport Layer ndash TCP
41
BB
TCP Congestion Control details
sender limits transmission LastByteSent-LastByteAcked
cwnd
roughly
cwnd is dynamic function of perceived network congestion
How does sender perceive congestion
loss event = timeout or 3 duplicate acks
TCP sender reduces rate (cwnd) after loss event
Three mechanisms1 AIMD
2 slow start
3 conservative after timeout events
rate = cwnd
RTT Bytessec
Transport Layer ndash TCP
42
BB
TCP congestion avoidance congestion avoidance additive increase multiplicative decreaseadditive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every RTT until
loss is detected multiplicative decrease cut cwnd in half after loss
timecwnd
co
nge
stio
n w
indo
w s
ize
saw toothbehavior probingfor bandwidth
Transport Layer ndash TCP
43
BB
TCP Slow Start Slow Start when connection begins
increase rate exponentially until first loss event initially cwndcwnd = 1 MSS double cwndcwnd every RTT done by incrementing cwndcwnd by 1 MSS by 1 MSS for every ACK received
summary initial rate is slow but ramps up exponentially fast (doubling of the sending rate every RTT)
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transport Layer ndash TCP
44
BB
Refinement inferring lossRefinement inferring loss
after 3 dup ACKsafter 3 dup ACKs cwndcwnd is cut in halfhalf window then grows
linearlylinearly butbut after timeout after timeout
eventevent cwndcwnd is is set to 1 MSS1 MSS window then grows
exponentiallyexponentially Up to a thresholdUp to a threshold
then grows linearlylinearly
3 dup ACKs 3 dup ACKs indicates network capable of delivering some segments
timeouttimeout indicates a ldquomore alarmingrdquo congestion scenario
Philosophy
Transport Layer ndash TCP
45
BB
Refinement
Q when should the exponential increase switch to linear
A when cwndcwnd gets to 1212 of its value before timeout
Implementation variable ssthresh (slow-start threshold)ssthresh (slow-start threshold) on loss eventon loss event ssthreshssthresh is set to 1212 of cwndcwnd just
before loss event
Transport Layer ndash TCP
46
BB
TCP Sender Congestion ControlSTATE EVENT TCP SENDER Congestion-
Control ActionCommentary
SLOW START (SS)
ACK receipt for previously unACKed data
CongWin = CongWin + MSSIf(CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo
Resulting in a doubling of CongWin every RTT
Congestion Avoidance (CA)
ACK receipt for previously unACKed data
CongWin = CongWin + MSS (MSSCongWin)
Additive increase resulting in increasing of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate ACK
Threshold = CongWin 2CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
SS or CA Timeout Threshold = CongWin 2CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter Slow Start
SS or CA Duplicate ACK Increment duplicate ACK count for segment being ACKed
CongWin and Threshold not changed
Transport Layer ndash TCP
47
BB
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3 MSSretransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++
duplicate ACK
cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
Transport Layer ndash TCP
48
BB
Congestion control
TCPrsquos Congestion Control Service
CLIENTSERVER
Problem Gridlock sets-in when there is packet loss due to router congestion
forces the End Systems to decrease the rate at which packets are sentdecrease the rate at which packets are sent during periods of congestion
The sending systemrsquos packet is lost due to congestion and is alerted when it stops receiving ACKs of packets sent
Transport Layer ndash TCP
49
BBTransport Layer 3-49
Macroscopic Description of TCP throughput
whatrsquos the average throughout of TCP as a function of window size and RTT ignore slow start (typically very short phases)
let W be the window size when loss occurs when window is W throughput is WRTT just after loss window drops to W2 throughput to
W2RTT Throughput increases linearly (by MSSRTT every
RTT) Average Throughput 75 WRTT
(Based on Idealised model for the steady-state dynamics of TCP)
Transport Layer 3-50
TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired
throughput of 10 Gbps requires window size W = 83333 in-flight
segments Throughput in terms of loss rate
L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)
new versions of TCP is needed for high-speed environments
LRTT
MSS221
Transport Layer ndash TCP
51
BB
TCP FairnessTCP Fairness
Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth
TCP connection 1
bottleneckrouter
capacity RR
TCP connection 2
Go to Summary of TCP Congestion Control
Transport Layer ndash TCP
52
BB
Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link
Link with transmission rate of RR
Each connection have the same MSSMSS RTTRTT
No other TCP connections or UDP datagrams traverse the shared link
Ignore slow start phase of TCP
Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)
Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing
AssumptionsAssumptions
Transport Layer ndash TCP
53
BB
Why is TCP fairWhy is TCP fair
Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput
proportionally
RR
equal bandwidth shareequal bandwidth share
Connection 1 throughput
Connect
ion 2
thro
ughput
congestion avoidance additive increaseloss decrease window by factor of 2
RR
A point on the graph depicts the amount of link bandwidth
jointly jointly consumedconsumed by
the connections
Full bandwidth utilisation line
We can viewWe can view a simulation a simulation
on thison this
View SimulationView Simulation
Transport Layer ndash TCP
54
BB
The End
The next succeeding slides are just for additional reading
Transport Layer ndash TCP
55
BB
TCP Latency ModelingTCP Latency Modeling
In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs
Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth
Multiple End Systems sharing a link
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
Loop holes in TCPLoop holes in TCP
1 TCP connection
1 TCP connection
1 TCP connection
3 TCP connections
Multithreading implementationMultithreading implementation
Transport Layer ndash TCP
56
BB
TCP latency modelingTCP latency modeling
TCP connection establishment time data transfer delay Actual data transmission time
Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent
QQ How long How long does it take to does it take to
receive an object receive an object from a Web from a Web
serverserver
No data transfer delayNo data transfer delay
Therersquos data transfer delayTherersquos data transfer delay
the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety
Transport Layer ndash TCP
57
BB
TCP Latency ModelingTCP Latency Modeling
Network is uncongested with one link between end systems of rate RR
CongWinCongWin (fixed) determines the amount of data that can be sent
No packet loss no packet corruption no retransmissions required
Header overheads are negligible
File to sendFile to send = integer number of segments of size MSS
Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times
CLIENT SERVER
FILEFILE
Initial ThresholdThreshold of TCP congestion mechanism is very big
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits
SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)
Transport Layer ndash TCP
58
BB
TCP latency ModelingTCP latency Modeling
Case 1 latency = 2RTT + OR
K = Number of Windows of data that cover the object
K = OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
Number of segmentsRounded up to the nearest integer
Assume W=4 segmentsAssume W=4 segments
eg O=256bits S=32bits W=4
Transport Layer ndash TCP
59
BB
TCP latency ModelingTCP latency Modeling
Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]
Number of Windows of data that cover the objectK= OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR
Sender has to wait for an ACK after a windowrsquos worth of data sent
STALLED STALLED PERIODPERIOD
If there are k windows sender willbe stalled (k-1) times
Transport Layer ndash TCP
60
BB
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
STALLED STALLED PERIODPERIOD
4 windows4 windows
OS=15
Transport Layer ndash TCP
61
BB
bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the
object as followsobject as follows
S
OkK k 110 222min
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1log
1logmin
12min
2
2
S
OK
S
OkkK
S
OkK k
Note
Transport Layer ndash TCP
62
BB
bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window
bull Transmission of Transmission of kkth window window ==
bull Stall Time Stall Time ==
bull Latency Latency ==
12
k
R
S
12
k
R
SRTT
R
S
RTTR
S
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1K
1
122k
k
R
SRTT
R
S
R
ORTT
Transport Layer ndash TCP
63
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
Transport Layer ndash TCP
64
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
R
S
R
SRTTP
R
ORTT P )12(2Latency
bull Closed-form expression for the latency
Transport Layer ndash TCP
65
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
2
1
RTTRO
P
encyMinimumLat
Latency
Slow start will not significantly increase latency if RTT ltlt OR
Transport Layer ndash TCP
66
BB
httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features
Transport Layer ndash TCP
6
BB
Connection Oriented Transport Connection Oriented Transport TCPTCP
TCP Segment Structure SEQ and ACK numbers Calculating the Timeout Interval The Simplified TCP Sender ACK Generation Recommendation (RFC 1122 RFC 2581)
Interesting Transmission Scenarios Flow Control TCP Connection Management
Transport Layer ndash TCP
7
BB
TCP segment structureTCP segment structure
source port dest port
32 bits
applicationdata
(variable length)
sequence number
acknowledgement numberrcvr window size
URGent data ptrchecksum
FSRPAUheadlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection established
(setup tear downcommands)
bytes thercvr is willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
In practice PSH URG and the Urgent Data Pointer are not used
HeaderHeader
We can view these teeny-weenydetails using Ethereal
Transport Layer ndash TCP
8
BB
ExampleExample
Suppose that a process in Host A wants to send a stream of data to a process in Host B over a TCP connection
AssumeData stream file consisting of 500000 bytes500000 bytesMSS 1000 bytes1000 bytesFirst byte of data stream numbered as 00
TCP constructs 500 segments out of the data stream
500000 bytes1000 bytes = 500 segments
Transport Layer ndash TCP
9
BB
TCP sequence s and ACKs
Sequence Numbers (s) byte stream number of first byte in segments data Do not necessarily start from 0 use random initial number
RRbull Segment 1 0 + RRbull Segment 2 1000 + RR etc
ACKs (acknowledgment) Seq of next byte expected from other side (last byte +1) Cumulative ACK If received segment 1 waits for segment 2 Eg Ack=1000 + R (received up to 999th byte)
00 1 2 3 4 999 10001000 1001 10021999
Segment 1 Segment 2
Transport Layer ndash TCP
10
BB
TCP sequence s and ACKs
Q how receiver handles out-of-order segments A TCP specs does does
notnot say - decide when implementing
Host A Host BSeq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
Usertypes
C
host ACKsreceipt
of echoedC
host ACKsreceipt of
C echoesback C
time
simple telnet scenario (with echo on)
Assuming that the starting sequence numbers for Host A
and Host B are 4242 and 7979 respectively
Send me the bytes from 43 onwardACK is being piggy-backed onserver-to-client data
Irsquom sending data starting at seq num=42
serverclient
Transport Layer ndash TCP
11
BB
Yet another server echo Yet another server echo exampleexample
Host A Host B
Seq=42 ACK=79 data = lsquoHellorsquo
Seq=79 ACK=47 data = lsquoHellorsquo
Seq=47 ACK=84 data = lsquo200rsquo
UsertypesHello
host ACKsreceipt
of echoedHello
send something else
host ACKsreceipt ofHello
echoes back Hello
time
Host Aseq=42ack=79
seq=47ack=84
Host B
seq=79
ack=47
seq=84
ack=50
Seq=84 ACK=50 data = lsquo200rsquo
ACK tells about up to what byte ACK tells about up to what byte has been receivedhas been received and what is the and what is the
next startingnext starting byte the host is byte the host is expecting to receiveexpecting to receive
Transport Layer ndash TCP
12
BB
TCP Round Trip Time and Timeout
Q how to set TCP how to set TCP timeout valuetimeout value
longer than RTT note RTT will vary
too short premature timeout unnecessary
retransmissions too long slow
reaction to segment loss
RTT = round trip time
Q how to estimate RTThow to estimate RTT SampleRTT measured time
from segment transmission until ACK receipt ignore retransmissions
cumulatively ACKed segments
SampleRTT will vary we would want estimated RTT to be smoother use several recent use several recent
measurementsmeasurements not just current SampleRTT
Main IssueMain Issue How long is the sender willing to wait before re-transmitting the packet
Transport Layer ndash TCP
13
BB
TCP Round Trip Time and Timeout
EstimatedRTT = (1-x) EstimatedRTT + x SampleRTTEstimatedRTT = (1-x) EstimatedRTT + x SampleRTT
Exponential weighted moving average influence of given sample decreases exponentially
fast typical value of x 0125 (RFC 2988)
Setting the timeoutSetting the timeout EstimatedRTT plus safety marginsafety margin large variation in EstimatedRTT -gt larger safety margin recommended value of x 025x 025
Timeout = EstimatedRTT + (4 Deviation)Timeout = EstimatedRTT + (4 Deviation)
DeviationDeviation = (1-x) Deviation + x |SampleRTT-EstimatedRTT|
Transport Layer ndash TCP
14
BB
EstimatedRTT = 0875 EstimatedRTT + 0125 SampleRTT
EstimatedRTT after the receipt of the ACK of segment 1EstimatedRTT = RTT for Segment 1 = 002746 second
EstimatedRTT after the receipt of the ACK of segment 2EstimatedRTT = 0875 002746 + 0125 0035557 = 00285
EstimatedRTT after the receipt of the ACK of segment 3EstimatedRTT = 0875 00285 + 0125 0070059 = 00337
EstimatedRTT after the receipt of the ACK of segment 4EstimatedRTT = 0875 00337+ 0125 011443 = 00438
EstimatedRTT after the receipt of the ACK of segment 5EstimatedRTT = 0875 00438 + 0125 013989 = 00558
EstimatedRTT after the receipt of the ACK of segment 6EstimatedRTT = 0875 00558 + 0125 018964 = 00725
Sample CalculationsSample Calculations
Transport Layer ndash TCP
15
BB
RTT Samples and RTT estimatesRTT Samples and RTT estimates
300
250
200
150
100 time
Estimated RTT
Sample RTT
RT
T (
mse
c)
The variations in the The variations in the SampleRTT SampleRTT are are smoothed out in the computation of thesmoothed out in the computation of the
EstimatedRTTEstimatedRTT
Transport Layer ndash TCP
16
BB
An Actual RTT estimationAn Actual RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RT
T (m
illis
eco
nds)
SampleRTT Estimated RTT
Transport Layer ndash TCP
17
BB
Simplified TCP sender assuming
waitfor
event
waitfor
event
event data received from application above
event timer timeout for segment with seq number y
event ACK receivedwith ACK number y
create send segment
retransmit segment
process ACK
- one way data transfer- no flow congestion control
FSM of TCP for Reliable Data Transfer FSM of TCP for Reliable Data Transfer
Transport Layer ndash TCP
18
BB
00 sendbase = initial_sequence number 01 nextseqnum = initial_sequence number 0203 loop (forever) 04 switch(event) 05 event data received from application above 06 create TCP segment with sequence number nextseqnum 07 If (timer is currently not running) start timer for segment
nextseqnum 08 pass segment to IP 09 nextseqnum = nextseqnum + length(data) 10 event timer timeout 11 retransmit not-yet-ACKed segment with smallest Seq 12 Start timer13 event ACK received with ACK field value of y 15 if (y gt sendbase) cumulative ACK of all data up to y 16 sendbase = y 17 If (there are currently any not-yet-ACKed segments)18 start timer19 20 end of loop forever
Associated with the oldest unACKed
segment
SIMPLIFIED TCPSIMPLIFIED TCP SENDER
AssumptionsAssumptions bull sender is not constrained by TCP flow or congestion controlbull that data from above is less than MSS in sizebull that data transfer is in one direction only
Transport Layer ndash TCP
20
BB
00 sendbase = initial_sequence number 01 nextseqnum = initial_sequence number 0203 loop (forever) 04 switch(event) 05 event data received from application above 06 create TCP segment with sequence number nextseqnum 07 start timer for segment nextseqnum 08 pass segment to IP 09 nextseqnum = nextseqnum + length(data) 10 event timer timeout for segment with sequence number y 11 retransmit segment with sequence number y 12 compute new timeout interval for segment y 13 restart timer for sequence number y 14 event ACK received with ACK field value of y 15 if (y gt sendbase) cumulative ACK of all data up to y 16 cancel all timers for segments with sequence numbers lt y 17 sendbase = y 18 19 else a duplicate ACK for already ACKed segment 20 increment number of duplicate ACKs received for y 21 if (number of duplicate ACKS received for y is 3) 22 perform TCP fast retransmission 23 resend segment with sequence number y 24 restart timer for segment y 25 26 end of loop forever
TCPTCP with MODIFICATIONSwith MODIFICATIONS SENDER
Why wait for the timeout to expire when consecutive
ACKs can be used to indicate a lost segment
With Fast Retransmit
Transport Layer ndash TCP
21
BB
TCP ACK generation [RFC 1122 RFC 2581]
Event
in-order segment arrival no gapseverything else already ACKed
in-order segment arrival no gaps one delayed ACK pending (due to action 1)
out-of-order segment arrivalwith higher than expect seq - a gap is detected
arrival of segment that partially or completely fills gap
TCP Receiver action
Delay sending the ACK Wait up to 500msfor next segment If next segment does not arrive in this interval send ACK
immediately send a singlecumulative ACK
send duplicate ACK indicating seq of next expected byte
Immediately send an ACK if segment startsat lower end of gap
1
2
3
4
Receiver does not discard out-of-order segmentsReceiver does not discard out-of-order segments
Transport Layer ndash TCP
22
BB
TCP Interesting Scenarios
Host A
Seq=92 8 bytes data
ACK=100
loss
tim
eout
time lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
Host A
Seq=100 20 bytes data
ACK=100
Seq=
92
tim
eout
time premature timeoutcumulative ACKs
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
ACK=120
Retransmission due to lost ACKSegment with Seq=100 not retransmitted
Timer is restarted here for Seq=92
Simplified TCP versionSimplified TCP version
Transport Layer ndash TCP
23
BB
Host A
Seq=100 20 bytes data
ACK=100Seq=
92
tim
eout
time
Host B
ACK=120
Seq=92 8 bytes data
Xloss
Cumulative ACK avoids retransmission of the first segment
TCP Retransmission ScenarioTCP Retransmission Scenario
Transport Layer ndash TCP
24
BB
TCP Modifications Doubling the Timeout IntervalDoubling the Timeout Interval
Provides a limited form of congestion control
Timer expiration is more likely caused by congestion in the network
TimeoutInterval = 2 TimeoutIntervalPrevious
TCP acts more politely by increasing the TimeoutInterval causing the sender to retransmit after longer and longer intervals
Congestion may get worse if sources
continue to retransmit packets
persistently
After ACK is received After ACK is received TimeoutInterval is derived TimeoutInterval is derived from most recent EstimatedRTT from most recent EstimatedRTT and DevRTTand DevRTT
Others check RFC 2018 ndash selective ACK
Transport Layer ndash TCP
25
BB
receiverreceiver explicitly informs sender of (dynamically changing) amount of free buffer space RcvWindowRcvWindow field
in TCP segment
sendersender keeps the amount of transmitted unACKed data less thanless than most recently received RcvWindowRcvWindow
sender wont overrun
receivers buffer bytransmitting too
much too fast
flow controlflow control
receiver buffering
RcvBufferRcvBuffer = size of TCP Receive Buffer
RcvWindowRcvWindow = amount of spare room in Buffer
TCP Flow ControlTCP Flow Control
Transport Layer ndash TCP
26
BB
FLOW CONTROL FLOW CONTROL ReceiverReceiver
04060 50100
LastByteRcvd
EXAMPLE HOST A sends a large file to HOST B
LastByteRead
Application Process
Data from IP
RcvBuffer
RcvWindow=RcvBuffer-[LastByteRcvd-LastByteRead]
RECEIVER HOST B ndash uses RcvWindow LastByteRcvd LastByteReadRcvWindow LastByteRcvd LastByteRead
Initially RcvWindow = RcvBuffer Application reads from the bufferHOST BHOST B tells HOST AHOST A how much spare roomhow much spare room it has in the connection buffer by placing its current value of RcvWindowRcvWindow in the receive window field of every segment it sends to HOST AHOST A
Transport Layer ndash TCP
27
BB
FLOW CONTROL SenderSender
04060 50100
LastByteSent
EXAMPLE HOST A sends a large file to HOST B
LastByteACKed
Data
To ensure that HOST B does not overflow HOST A maintains throughout the connectionrsquos life that [LastByteSent-LastByteACKed] lt= RcvWindow
SENDER HOST A
ACKs ACKs from from Host Host BB
SENDER HOST A ndash uses RcvWindow RcvWindow of HostBof HostB LastByteSent LastByteACKed LastByteSent LastByteACKed
Transport Layer ndash TCP
28
BB
FLOW CONTROLFLOW CONTROLSome issue to consider
RcvWindowRcvWindow ndash used by the connection to provide the flow control serviceflow control service
TCPTCP requires that requires that HOST AHOST A continue to sendcontinue to send segments with one segments with one data byte when HOST Brsquos receive data byte when HOST Brsquos receive window is window is 00 Such segments will be Such segments will be ACKedACKed by HOST B by HOST B Eventually the buffer will have Eventually the buffer will have some space and the some space and the ACKsACKs will will contain contain RcvWindow gt 0RcvWindow gt 0
What happens when the receive
buffer of HOST B is full full (that is when (that is when
RcvWindow=0)RcvWindow=0)
TCP sends a segment only when there is datadata or ACKACK to send Therefore the sender must maintain the connection lsquoalivealiversquo
Transport Layer ndash TCP
29
BB
TCP Connection ManagementTCP Connection Management
Recall TCP sender receiver establish ldquoconnectionconnectionrdquo before exchanging data segments
Initialize TCP variables sequence numbers buffers flow control info (eg RcvWindow)
ClientClient is the connection initiator
In Java Socket clientSocket = new Socket(hostnameport number) connect ServerServer is contacted by client
In JavaSocket accept()
if (connectconnect(s (struct sockaddr )ampsin sizeof(sin)) = 0) printf(connect failedn) WSACleanup() exit(1)
ns = acceptaccept(s(struct sockaddr )(ampremoteaddr)ampaddrlen)
Transport Layer ndash TCP
30
BB
TCP Connection Management
Three way handshakeStep 1 clientclient end system sends TCP
SYNSYN control segment to server (executed by TCP itself)
specifies initial seq number (isn)
Step 2 serverserver end system receives SYNSYN replies with SYNACKSYNACK control segment
ACKs received SYN allocates buffers specifies serverrsquos initial seq
number
Step 3 clientclient ACKsACKs the connection with
ACK=server_isn +1ACK=server_isn +1
allocates buffers sends SYN=0SYN=0
Connection established
Client
AcceptAccept (SYN=1
seq=server_isnack=client_isn+1)
time
Server
ConnectConnect (SYN=1 seq=client_isn)
ACK (SYN=0
seq=client_isn+1ack=server_isn+1)
Establishing a connection
This is what happens when we create a socket for
connection to a server
After establishing the connection the client can receive segments with app-generated data (SYN=0)(SYN=0)
Transport Layer ndash TCP
31
BB
TCP Connection Management (cont)
Closing a connection
client closes socket
closesocket(s)closesocket(s)
Java clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
How TCP connection is established and torn down
Transport Layer ndash TCP
32
BB
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
Enters timed wait - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer ndash TCP
33
BB
TCP Connection Management (cont)
TCP client lifecycle
TCP server lifecycle
Used in case ACK gets lost It is implementation-dependent (eg 30
seconds 1 minute 2 minutes
Connection formally closes ndash all resources (eg port numbers) are
released
1
2
3
4
5
6
7
8
9
10
11
12
Transport Layer ndash TCP
34
BB
End of Flow Control and Error Control
Transport Layer ndash TCP
35
BB
Flow Control vs Congestion ControlFlow Control vs Congestion ControlSimilar actions are taken but for very different reasons
Flow ControlFlow Controlbull point-to-point traffic between sender and receiverbull speed matching service matching the rate at which the sender is sending against the rate at which the receiving application is readingbull prevents Receiver Buffer from overflowing
CongestionCongestion ndash happens when there are too many sources attempting to send data at too high a rate for the routers along the path
Congestion ControlCongestion Controlbull service that makes sure that the routers between End Systems are able to carry the offered trafficbull prevents routers from overflowing
Same course of action Throttling of the sender
Transport Layer ndash TCP
36
BB
CongestionCongestion Informally too many sources sending too
much data too fast for network to handle different from flow control Manifestations
lost packets (buffer overflow at routers) long delays (queuing in router buffers)
a top-10 problem
Principles of Congestion ControlPrinciples of Congestion Control
Transport Layer ndash TCP
37
BB
Approaches towards congestion control
End-to-end congestion control
no explicit feedback from network
congestion inferred by end-systems from observed packet loss amp delay
approach taken by TCP
Network-assisted congestion control
routers provide feedback to End Systems in the form of single bit indicating
link congestion (SNA DECbit TCPIP ECN ATM ABR)
explicit transmission rate the sender should send at
1 2
Two broad approaches towards congestion control
Transport Layer ndash TCP
38
BB
TCP Congestion ControlTCP Congestion ControlHow TCP sender limits the rate at which it sends traffic into its connection
SENDER
(Amount of unACKed data)SENDER lt min(CongWin RcvWindow)
LastByteSent - LastByteACKed
Indirectly limits the senderrsquos send rateAssumptions
bull TCP receive buffer is very large ndash no RcvWindow constraint Amt of unACKed data at sender is solely limited by CongWin
bull Packet loss delay amp packet transmission delay are negligible
Sending rate (approx) CongWinRTT
By adjusting CongWin sender can therefore adjust the
rate at which it sends data into its connection
New variable ndash Congestion
Window
Transport Layer ndash TCP
39
BB
TCP Congestion ControlTCP Congestion ControlTCP uses ACKs to trigger (ldquoclockrdquo) its increase in congestion window size ndash ldquoself-clockingrdquo
Arrival of ACKs ndash indication to the sender that all is well
1 Slow Rate
bull Congestion window will be increased at a relatively slow rate
2 High rate
bull Congestion window will be increased more quickly
Transport Layer ndash TCP
40
BB
TCP Congestion ControlTCP Congestion ControlHow TCP perceives that there is congestion on the path
ldquoLoss Eventrdquo ndash when there is excessive congestion router buffers along the path overflows causing datagrams to be dropped which in turn results in a ldquoloss eventrdquo at the sender
1 Timeout
bull no ACK is received after segment loss
2 Receipt of three duplicate ACKs
bull segment loss is followed by three ACKs received at the sender
Transport Layer ndash TCP
41
BB
TCP Congestion Control details
sender limits transmission LastByteSent-LastByteAcked
cwnd
roughly
cwnd is dynamic function of perceived network congestion
How does sender perceive congestion
loss event = timeout or 3 duplicate acks
TCP sender reduces rate (cwnd) after loss event
Three mechanisms1 AIMD
2 slow start
3 conservative after timeout events
rate = cwnd
RTT Bytessec
Transport Layer ndash TCP
42
BB
TCP congestion avoidance congestion avoidance additive increase multiplicative decreaseadditive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every RTT until
loss is detected multiplicative decrease cut cwnd in half after loss
timecwnd
co
nge
stio
n w
indo
w s
ize
saw toothbehavior probingfor bandwidth
Transport Layer ndash TCP
43
BB
TCP Slow Start Slow Start when connection begins
increase rate exponentially until first loss event initially cwndcwnd = 1 MSS double cwndcwnd every RTT done by incrementing cwndcwnd by 1 MSS by 1 MSS for every ACK received
summary initial rate is slow but ramps up exponentially fast (doubling of the sending rate every RTT)
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transport Layer ndash TCP
44
BB
Refinement inferring lossRefinement inferring loss
after 3 dup ACKsafter 3 dup ACKs cwndcwnd is cut in halfhalf window then grows
linearlylinearly butbut after timeout after timeout
eventevent cwndcwnd is is set to 1 MSS1 MSS window then grows
exponentiallyexponentially Up to a thresholdUp to a threshold
then grows linearlylinearly
3 dup ACKs 3 dup ACKs indicates network capable of delivering some segments
timeouttimeout indicates a ldquomore alarmingrdquo congestion scenario
Philosophy
Transport Layer ndash TCP
45
BB
Refinement
Q when should the exponential increase switch to linear
A when cwndcwnd gets to 1212 of its value before timeout
Implementation variable ssthresh (slow-start threshold)ssthresh (slow-start threshold) on loss eventon loss event ssthreshssthresh is set to 1212 of cwndcwnd just
before loss event
Transport Layer ndash TCP
46
BB
TCP Sender Congestion ControlSTATE EVENT TCP SENDER Congestion-
Control ActionCommentary
SLOW START (SS)
ACK receipt for previously unACKed data
CongWin = CongWin + MSSIf(CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo
Resulting in a doubling of CongWin every RTT
Congestion Avoidance (CA)
ACK receipt for previously unACKed data
CongWin = CongWin + MSS (MSSCongWin)
Additive increase resulting in increasing of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate ACK
Threshold = CongWin 2CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
SS or CA Timeout Threshold = CongWin 2CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter Slow Start
SS or CA Duplicate ACK Increment duplicate ACK count for segment being ACKed
CongWin and Threshold not changed
Transport Layer ndash TCP
47
BB
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3 MSSretransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++
duplicate ACK
cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
Transport Layer ndash TCP
48
BB
Congestion control
TCPrsquos Congestion Control Service
CLIENTSERVER
Problem Gridlock sets-in when there is packet loss due to router congestion
forces the End Systems to decrease the rate at which packets are sentdecrease the rate at which packets are sent during periods of congestion
The sending systemrsquos packet is lost due to congestion and is alerted when it stops receiving ACKs of packets sent
Transport Layer ndash TCP
49
BBTransport Layer 3-49
Macroscopic Description of TCP throughput
whatrsquos the average throughout of TCP as a function of window size and RTT ignore slow start (typically very short phases)
let W be the window size when loss occurs when window is W throughput is WRTT just after loss window drops to W2 throughput to
W2RTT Throughput increases linearly (by MSSRTT every
RTT) Average Throughput 75 WRTT
(Based on Idealised model for the steady-state dynamics of TCP)
Transport Layer 3-50
TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired
throughput of 10 Gbps requires window size W = 83333 in-flight
segments Throughput in terms of loss rate
L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)
new versions of TCP is needed for high-speed environments
LRTT
MSS221
Transport Layer ndash TCP
51
BB
TCP FairnessTCP Fairness
Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth
TCP connection 1
bottleneckrouter
capacity RR
TCP connection 2
Go to Summary of TCP Congestion Control
Transport Layer ndash TCP
52
BB
Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link
Link with transmission rate of RR
Each connection have the same MSSMSS RTTRTT
No other TCP connections or UDP datagrams traverse the shared link
Ignore slow start phase of TCP
Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)
Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing
AssumptionsAssumptions
Transport Layer ndash TCP
53
BB
Why is TCP fairWhy is TCP fair
Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput
proportionally
RR
equal bandwidth shareequal bandwidth share
Connection 1 throughput
Connect
ion 2
thro
ughput
congestion avoidance additive increaseloss decrease window by factor of 2
RR
A point on the graph depicts the amount of link bandwidth
jointly jointly consumedconsumed by
the connections
Full bandwidth utilisation line
We can viewWe can view a simulation a simulation
on thison this
View SimulationView Simulation
Transport Layer ndash TCP
54
BB
The End
The next succeeding slides are just for additional reading
Transport Layer ndash TCP
55
BB
TCP Latency ModelingTCP Latency Modeling
In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs
Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth
Multiple End Systems sharing a link
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
Loop holes in TCPLoop holes in TCP
1 TCP connection
1 TCP connection
1 TCP connection
3 TCP connections
Multithreading implementationMultithreading implementation
Transport Layer ndash TCP
56
BB
TCP latency modelingTCP latency modeling
TCP connection establishment time data transfer delay Actual data transmission time
Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent
QQ How long How long does it take to does it take to
receive an object receive an object from a Web from a Web
serverserver
No data transfer delayNo data transfer delay
Therersquos data transfer delayTherersquos data transfer delay
the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety
Transport Layer ndash TCP
57
BB
TCP Latency ModelingTCP Latency Modeling
Network is uncongested with one link between end systems of rate RR
CongWinCongWin (fixed) determines the amount of data that can be sent
No packet loss no packet corruption no retransmissions required
Header overheads are negligible
File to sendFile to send = integer number of segments of size MSS
Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times
CLIENT SERVER
FILEFILE
Initial ThresholdThreshold of TCP congestion mechanism is very big
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits
SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)
Transport Layer ndash TCP
58
BB
TCP latency ModelingTCP latency Modeling
Case 1 latency = 2RTT + OR
K = Number of Windows of data that cover the object
K = OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
Number of segmentsRounded up to the nearest integer
Assume W=4 segmentsAssume W=4 segments
eg O=256bits S=32bits W=4
Transport Layer ndash TCP
59
BB
TCP latency ModelingTCP latency Modeling
Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]
Number of Windows of data that cover the objectK= OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR
Sender has to wait for an ACK after a windowrsquos worth of data sent
STALLED STALLED PERIODPERIOD
If there are k windows sender willbe stalled (k-1) times
Transport Layer ndash TCP
60
BB
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
STALLED STALLED PERIODPERIOD
4 windows4 windows
OS=15
Transport Layer ndash TCP
61
BB
bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the
object as followsobject as follows
S
OkK k 110 222min
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1log
1logmin
12min
2
2
S
OK
S
OkkK
S
OkK k
Note
Transport Layer ndash TCP
62
BB
bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window
bull Transmission of Transmission of kkth window window ==
bull Stall Time Stall Time ==
bull Latency Latency ==
12
k
R
S
12
k
R
SRTT
R
S
RTTR
S
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1K
1
122k
k
R
SRTT
R
S
R
ORTT
Transport Layer ndash TCP
63
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
Transport Layer ndash TCP
64
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
R
S
R
SRTTP
R
ORTT P )12(2Latency
bull Closed-form expression for the latency
Transport Layer ndash TCP
65
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
2
1
RTTRO
P
encyMinimumLat
Latency
Slow start will not significantly increase latency if RTT ltlt OR
Transport Layer ndash TCP
66
BB
httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features
Transport Layer ndash TCP
7
BB
TCP segment structureTCP segment structure
source port dest port
32 bits
applicationdata
(variable length)
sequence number
acknowledgement numberrcvr window size
URGent data ptrchecksum
FSRPAUheadlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection established
(setup tear downcommands)
bytes thercvr is willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
In practice PSH URG and the Urgent Data Pointer are not used
HeaderHeader
We can view these teeny-weenydetails using Ethereal
Transport Layer ndash TCP
8
BB
ExampleExample
Suppose that a process in Host A wants to send a stream of data to a process in Host B over a TCP connection
AssumeData stream file consisting of 500000 bytes500000 bytesMSS 1000 bytes1000 bytesFirst byte of data stream numbered as 00
TCP constructs 500 segments out of the data stream
500000 bytes1000 bytes = 500 segments
Transport Layer ndash TCP
9
BB
TCP sequence s and ACKs
Sequence Numbers (s) byte stream number of first byte in segments data Do not necessarily start from 0 use random initial number
RRbull Segment 1 0 + RRbull Segment 2 1000 + RR etc
ACKs (acknowledgment) Seq of next byte expected from other side (last byte +1) Cumulative ACK If received segment 1 waits for segment 2 Eg Ack=1000 + R (received up to 999th byte)
00 1 2 3 4 999 10001000 1001 10021999
Segment 1 Segment 2
Transport Layer ndash TCP
10
BB
TCP sequence s and ACKs
Q how receiver handles out-of-order segments A TCP specs does does
notnot say - decide when implementing
Host A Host BSeq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
Usertypes
C
host ACKsreceipt
of echoedC
host ACKsreceipt of
C echoesback C
time
simple telnet scenario (with echo on)
Assuming that the starting sequence numbers for Host A
and Host B are 4242 and 7979 respectively
Send me the bytes from 43 onwardACK is being piggy-backed onserver-to-client data
Irsquom sending data starting at seq num=42
serverclient
Transport Layer ndash TCP
11
BB
Yet another server echo Yet another server echo exampleexample
Host A Host B
Seq=42 ACK=79 data = lsquoHellorsquo
Seq=79 ACK=47 data = lsquoHellorsquo
Seq=47 ACK=84 data = lsquo200rsquo
UsertypesHello
host ACKsreceipt
of echoedHello
send something else
host ACKsreceipt ofHello
echoes back Hello
time
Host Aseq=42ack=79
seq=47ack=84
Host B
seq=79
ack=47
seq=84
ack=50
Seq=84 ACK=50 data = lsquo200rsquo
ACK tells about up to what byte ACK tells about up to what byte has been receivedhas been received and what is the and what is the
next startingnext starting byte the host is byte the host is expecting to receiveexpecting to receive
Transport Layer ndash TCP
12
BB
TCP Round Trip Time and Timeout
Q how to set TCP how to set TCP timeout valuetimeout value
longer than RTT note RTT will vary
too short premature timeout unnecessary
retransmissions too long slow
reaction to segment loss
RTT = round trip time
Q how to estimate RTThow to estimate RTT SampleRTT measured time
from segment transmission until ACK receipt ignore retransmissions
cumulatively ACKed segments
SampleRTT will vary we would want estimated RTT to be smoother use several recent use several recent
measurementsmeasurements not just current SampleRTT
Main IssueMain Issue How long is the sender willing to wait before re-transmitting the packet
Transport Layer ndash TCP
13
BB
TCP Round Trip Time and Timeout
EstimatedRTT = (1-x) EstimatedRTT + x SampleRTTEstimatedRTT = (1-x) EstimatedRTT + x SampleRTT
Exponential weighted moving average influence of given sample decreases exponentially
fast typical value of x 0125 (RFC 2988)
Setting the timeoutSetting the timeout EstimatedRTT plus safety marginsafety margin large variation in EstimatedRTT -gt larger safety margin recommended value of x 025x 025
Timeout = EstimatedRTT + (4 Deviation)Timeout = EstimatedRTT + (4 Deviation)
DeviationDeviation = (1-x) Deviation + x |SampleRTT-EstimatedRTT|
Transport Layer ndash TCP
14
BB
EstimatedRTT = 0875 EstimatedRTT + 0125 SampleRTT
EstimatedRTT after the receipt of the ACK of segment 1EstimatedRTT = RTT for Segment 1 = 002746 second
EstimatedRTT after the receipt of the ACK of segment 2EstimatedRTT = 0875 002746 + 0125 0035557 = 00285
EstimatedRTT after the receipt of the ACK of segment 3EstimatedRTT = 0875 00285 + 0125 0070059 = 00337
EstimatedRTT after the receipt of the ACK of segment 4EstimatedRTT = 0875 00337+ 0125 011443 = 00438
EstimatedRTT after the receipt of the ACK of segment 5EstimatedRTT = 0875 00438 + 0125 013989 = 00558
EstimatedRTT after the receipt of the ACK of segment 6EstimatedRTT = 0875 00558 + 0125 018964 = 00725
Sample CalculationsSample Calculations
Transport Layer ndash TCP
15
BB
RTT Samples and RTT estimatesRTT Samples and RTT estimates
300
250
200
150
100 time
Estimated RTT
Sample RTT
RT
T (
mse
c)
The variations in the The variations in the SampleRTT SampleRTT are are smoothed out in the computation of thesmoothed out in the computation of the
EstimatedRTTEstimatedRTT
Transport Layer ndash TCP
16
BB
An Actual RTT estimationAn Actual RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RT
T (m
illis
eco
nds)
SampleRTT Estimated RTT
Transport Layer ndash TCP
17
BB
Simplified TCP sender assuming
waitfor
event
waitfor
event
event data received from application above
event timer timeout for segment with seq number y
event ACK receivedwith ACK number y
create send segment
retransmit segment
process ACK
- one way data transfer- no flow congestion control
FSM of TCP for Reliable Data Transfer FSM of TCP for Reliable Data Transfer
Transport Layer ndash TCP
18
BB
00 sendbase = initial_sequence number 01 nextseqnum = initial_sequence number 0203 loop (forever) 04 switch(event) 05 event data received from application above 06 create TCP segment with sequence number nextseqnum 07 If (timer is currently not running) start timer for segment
nextseqnum 08 pass segment to IP 09 nextseqnum = nextseqnum + length(data) 10 event timer timeout 11 retransmit not-yet-ACKed segment with smallest Seq 12 Start timer13 event ACK received with ACK field value of y 15 if (y gt sendbase) cumulative ACK of all data up to y 16 sendbase = y 17 If (there are currently any not-yet-ACKed segments)18 start timer19 20 end of loop forever
Associated with the oldest unACKed
segment
SIMPLIFIED TCPSIMPLIFIED TCP SENDER
AssumptionsAssumptions bull sender is not constrained by TCP flow or congestion controlbull that data from above is less than MSS in sizebull that data transfer is in one direction only
Transport Layer ndash TCP
20
BB
00 sendbase = initial_sequence number 01 nextseqnum = initial_sequence number 0203 loop (forever) 04 switch(event) 05 event data received from application above 06 create TCP segment with sequence number nextseqnum 07 start timer for segment nextseqnum 08 pass segment to IP 09 nextseqnum = nextseqnum + length(data) 10 event timer timeout for segment with sequence number y 11 retransmit segment with sequence number y 12 compute new timeout interval for segment y 13 restart timer for sequence number y 14 event ACK received with ACK field value of y 15 if (y gt sendbase) cumulative ACK of all data up to y 16 cancel all timers for segments with sequence numbers lt y 17 sendbase = y 18 19 else a duplicate ACK for already ACKed segment 20 increment number of duplicate ACKs received for y 21 if (number of duplicate ACKS received for y is 3) 22 perform TCP fast retransmission 23 resend segment with sequence number y 24 restart timer for segment y 25 26 end of loop forever
TCPTCP with MODIFICATIONSwith MODIFICATIONS SENDER
Why wait for the timeout to expire when consecutive
ACKs can be used to indicate a lost segment
With Fast Retransmit
Transport Layer ndash TCP
21
BB
TCP ACK generation [RFC 1122 RFC 2581]
Event
in-order segment arrival no gapseverything else already ACKed
in-order segment arrival no gaps one delayed ACK pending (due to action 1)
out-of-order segment arrivalwith higher than expect seq - a gap is detected
arrival of segment that partially or completely fills gap
TCP Receiver action
Delay sending the ACK Wait up to 500msfor next segment If next segment does not arrive in this interval send ACK
immediately send a singlecumulative ACK
send duplicate ACK indicating seq of next expected byte
Immediately send an ACK if segment startsat lower end of gap
1
2
3
4
Receiver does not discard out-of-order segmentsReceiver does not discard out-of-order segments
Transport Layer ndash TCP
22
BB
TCP Interesting Scenarios
Host A
Seq=92 8 bytes data
ACK=100
loss
tim
eout
time lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
Host A
Seq=100 20 bytes data
ACK=100
Seq=
92
tim
eout
time premature timeoutcumulative ACKs
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
ACK=120
Retransmission due to lost ACKSegment with Seq=100 not retransmitted
Timer is restarted here for Seq=92
Simplified TCP versionSimplified TCP version
Transport Layer ndash TCP
23
BB
Host A
Seq=100 20 bytes data
ACK=100Seq=
92
tim
eout
time
Host B
ACK=120
Seq=92 8 bytes data
Xloss
Cumulative ACK avoids retransmission of the first segment
TCP Retransmission ScenarioTCP Retransmission Scenario
Transport Layer ndash TCP
24
BB
TCP Modifications Doubling the Timeout IntervalDoubling the Timeout Interval
Provides a limited form of congestion control
Timer expiration is more likely caused by congestion in the network
TimeoutInterval = 2 TimeoutIntervalPrevious
TCP acts more politely by increasing the TimeoutInterval causing the sender to retransmit after longer and longer intervals
Congestion may get worse if sources
continue to retransmit packets
persistently
After ACK is received After ACK is received TimeoutInterval is derived TimeoutInterval is derived from most recent EstimatedRTT from most recent EstimatedRTT and DevRTTand DevRTT
Others check RFC 2018 ndash selective ACK
Transport Layer ndash TCP
25
BB
receiverreceiver explicitly informs sender of (dynamically changing) amount of free buffer space RcvWindowRcvWindow field
in TCP segment
sendersender keeps the amount of transmitted unACKed data less thanless than most recently received RcvWindowRcvWindow
sender wont overrun
receivers buffer bytransmitting too
much too fast
flow controlflow control
receiver buffering
RcvBufferRcvBuffer = size of TCP Receive Buffer
RcvWindowRcvWindow = amount of spare room in Buffer
TCP Flow ControlTCP Flow Control
Transport Layer ndash TCP
26
BB
FLOW CONTROL FLOW CONTROL ReceiverReceiver
04060 50100
LastByteRcvd
EXAMPLE HOST A sends a large file to HOST B
LastByteRead
Application Process
Data from IP
RcvBuffer
RcvWindow=RcvBuffer-[LastByteRcvd-LastByteRead]
RECEIVER HOST B ndash uses RcvWindow LastByteRcvd LastByteReadRcvWindow LastByteRcvd LastByteRead
Initially RcvWindow = RcvBuffer Application reads from the bufferHOST BHOST B tells HOST AHOST A how much spare roomhow much spare room it has in the connection buffer by placing its current value of RcvWindowRcvWindow in the receive window field of every segment it sends to HOST AHOST A
Transport Layer ndash TCP
27
BB
FLOW CONTROL SenderSender
04060 50100
LastByteSent
EXAMPLE HOST A sends a large file to HOST B
LastByteACKed
Data
To ensure that HOST B does not overflow HOST A maintains throughout the connectionrsquos life that [LastByteSent-LastByteACKed] lt= RcvWindow
SENDER HOST A
ACKs ACKs from from Host Host BB
SENDER HOST A ndash uses RcvWindow RcvWindow of HostBof HostB LastByteSent LastByteACKed LastByteSent LastByteACKed
Transport Layer ndash TCP
28
BB
FLOW CONTROLFLOW CONTROLSome issue to consider
RcvWindowRcvWindow ndash used by the connection to provide the flow control serviceflow control service
TCPTCP requires that requires that HOST AHOST A continue to sendcontinue to send segments with one segments with one data byte when HOST Brsquos receive data byte when HOST Brsquos receive window is window is 00 Such segments will be Such segments will be ACKedACKed by HOST B by HOST B Eventually the buffer will have Eventually the buffer will have some space and the some space and the ACKsACKs will will contain contain RcvWindow gt 0RcvWindow gt 0
What happens when the receive
buffer of HOST B is full full (that is when (that is when
RcvWindow=0)RcvWindow=0)
TCP sends a segment only when there is datadata or ACKACK to send Therefore the sender must maintain the connection lsquoalivealiversquo
Transport Layer ndash TCP
29
BB
TCP Connection ManagementTCP Connection Management
Recall TCP sender receiver establish ldquoconnectionconnectionrdquo before exchanging data segments
Initialize TCP variables sequence numbers buffers flow control info (eg RcvWindow)
ClientClient is the connection initiator
In Java Socket clientSocket = new Socket(hostnameport number) connect ServerServer is contacted by client
In JavaSocket accept()
if (connectconnect(s (struct sockaddr )ampsin sizeof(sin)) = 0) printf(connect failedn) WSACleanup() exit(1)
ns = acceptaccept(s(struct sockaddr )(ampremoteaddr)ampaddrlen)
Transport Layer ndash TCP
30
BB
TCP Connection Management
Three way handshakeStep 1 clientclient end system sends TCP
SYNSYN control segment to server (executed by TCP itself)
specifies initial seq number (isn)
Step 2 serverserver end system receives SYNSYN replies with SYNACKSYNACK control segment
ACKs received SYN allocates buffers specifies serverrsquos initial seq
number
Step 3 clientclient ACKsACKs the connection with
ACK=server_isn +1ACK=server_isn +1
allocates buffers sends SYN=0SYN=0
Connection established
Client
AcceptAccept (SYN=1
seq=server_isnack=client_isn+1)
time
Server
ConnectConnect (SYN=1 seq=client_isn)
ACK (SYN=0
seq=client_isn+1ack=server_isn+1)
Establishing a connection
This is what happens when we create a socket for
connection to a server
After establishing the connection the client can receive segments with app-generated data (SYN=0)(SYN=0)
Transport Layer ndash TCP
31
BB
TCP Connection Management (cont)
Closing a connection
client closes socket
closesocket(s)closesocket(s)
Java clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
How TCP connection is established and torn down
Transport Layer ndash TCP
32
BB
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
Enters timed wait - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer ndash TCP
33
BB
TCP Connection Management (cont)
TCP client lifecycle
TCP server lifecycle
Used in case ACK gets lost It is implementation-dependent (eg 30
seconds 1 minute 2 minutes
Connection formally closes ndash all resources (eg port numbers) are
released
1
2
3
4
5
6
7
8
9
10
11
12
Transport Layer ndash TCP
34
BB
End of Flow Control and Error Control
Transport Layer ndash TCP
35
BB
Flow Control vs Congestion ControlFlow Control vs Congestion ControlSimilar actions are taken but for very different reasons
Flow ControlFlow Controlbull point-to-point traffic between sender and receiverbull speed matching service matching the rate at which the sender is sending against the rate at which the receiving application is readingbull prevents Receiver Buffer from overflowing
CongestionCongestion ndash happens when there are too many sources attempting to send data at too high a rate for the routers along the path
Congestion ControlCongestion Controlbull service that makes sure that the routers between End Systems are able to carry the offered trafficbull prevents routers from overflowing
Same course of action Throttling of the sender
Transport Layer ndash TCP
36
BB
CongestionCongestion Informally too many sources sending too
much data too fast for network to handle different from flow control Manifestations
lost packets (buffer overflow at routers) long delays (queuing in router buffers)
a top-10 problem
Principles of Congestion ControlPrinciples of Congestion Control
Transport Layer ndash TCP
37
BB
Approaches towards congestion control
End-to-end congestion control
no explicit feedback from network
congestion inferred by end-systems from observed packet loss amp delay
approach taken by TCP
Network-assisted congestion control
routers provide feedback to End Systems in the form of single bit indicating
link congestion (SNA DECbit TCPIP ECN ATM ABR)
explicit transmission rate the sender should send at
1 2
Two broad approaches towards congestion control
Transport Layer ndash TCP
38
BB
TCP Congestion ControlTCP Congestion ControlHow TCP sender limits the rate at which it sends traffic into its connection
SENDER
(Amount of unACKed data)SENDER lt min(CongWin RcvWindow)
LastByteSent - LastByteACKed
Indirectly limits the senderrsquos send rateAssumptions
bull TCP receive buffer is very large ndash no RcvWindow constraint Amt of unACKed data at sender is solely limited by CongWin
bull Packet loss delay amp packet transmission delay are negligible
Sending rate (approx) CongWinRTT
By adjusting CongWin sender can therefore adjust the
rate at which it sends data into its connection
New variable ndash Congestion
Window
Transport Layer ndash TCP
39
BB
TCP Congestion ControlTCP Congestion ControlTCP uses ACKs to trigger (ldquoclockrdquo) its increase in congestion window size ndash ldquoself-clockingrdquo
Arrival of ACKs ndash indication to the sender that all is well
1 Slow Rate
bull Congestion window will be increased at a relatively slow rate
2 High rate
bull Congestion window will be increased more quickly
Transport Layer ndash TCP
40
BB
TCP Congestion ControlTCP Congestion ControlHow TCP perceives that there is congestion on the path
ldquoLoss Eventrdquo ndash when there is excessive congestion router buffers along the path overflows causing datagrams to be dropped which in turn results in a ldquoloss eventrdquo at the sender
1 Timeout
bull no ACK is received after segment loss
2 Receipt of three duplicate ACKs
bull segment loss is followed by three ACKs received at the sender
Transport Layer ndash TCP
41
BB
TCP Congestion Control details
sender limits transmission LastByteSent-LastByteAcked
cwnd
roughly
cwnd is dynamic function of perceived network congestion
How does sender perceive congestion
loss event = timeout or 3 duplicate acks
TCP sender reduces rate (cwnd) after loss event
Three mechanisms1 AIMD
2 slow start
3 conservative after timeout events
rate = cwnd
RTT Bytessec
Transport Layer ndash TCP
42
BB
TCP congestion avoidance congestion avoidance additive increase multiplicative decreaseadditive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every RTT until
loss is detected multiplicative decrease cut cwnd in half after loss
timecwnd
co
nge
stio
n w
indo
w s
ize
saw toothbehavior probingfor bandwidth
Transport Layer ndash TCP
43
BB
TCP Slow Start Slow Start when connection begins
increase rate exponentially until first loss event initially cwndcwnd = 1 MSS double cwndcwnd every RTT done by incrementing cwndcwnd by 1 MSS by 1 MSS for every ACK received
summary initial rate is slow but ramps up exponentially fast (doubling of the sending rate every RTT)
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transport Layer ndash TCP
44
BB
Refinement inferring lossRefinement inferring loss
after 3 dup ACKsafter 3 dup ACKs cwndcwnd is cut in halfhalf window then grows
linearlylinearly butbut after timeout after timeout
eventevent cwndcwnd is is set to 1 MSS1 MSS window then grows
exponentiallyexponentially Up to a thresholdUp to a threshold
then grows linearlylinearly
3 dup ACKs 3 dup ACKs indicates network capable of delivering some segments
timeouttimeout indicates a ldquomore alarmingrdquo congestion scenario
Philosophy
Transport Layer ndash TCP
45
BB
Refinement
Q when should the exponential increase switch to linear
A when cwndcwnd gets to 1212 of its value before timeout
Implementation variable ssthresh (slow-start threshold)ssthresh (slow-start threshold) on loss eventon loss event ssthreshssthresh is set to 1212 of cwndcwnd just
before loss event
Transport Layer ndash TCP
46
BB
TCP Sender Congestion ControlSTATE EVENT TCP SENDER Congestion-
Control ActionCommentary
SLOW START (SS)
ACK receipt for previously unACKed data
CongWin = CongWin + MSSIf(CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo
Resulting in a doubling of CongWin every RTT
Congestion Avoidance (CA)
ACK receipt for previously unACKed data
CongWin = CongWin + MSS (MSSCongWin)
Additive increase resulting in increasing of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate ACK
Threshold = CongWin 2CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
SS or CA Timeout Threshold = CongWin 2CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter Slow Start
SS or CA Duplicate ACK Increment duplicate ACK count for segment being ACKed
CongWin and Threshold not changed
Transport Layer ndash TCP
47
BB
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3 MSSretransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++
duplicate ACK
cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
Transport Layer ndash TCP
48
BB
Congestion control
TCPrsquos Congestion Control Service
CLIENTSERVER
Problem Gridlock sets-in when there is packet loss due to router congestion
forces the End Systems to decrease the rate at which packets are sentdecrease the rate at which packets are sent during periods of congestion
The sending systemrsquos packet is lost due to congestion and is alerted when it stops receiving ACKs of packets sent
Transport Layer ndash TCP
49
BBTransport Layer 3-49
Macroscopic Description of TCP throughput
whatrsquos the average throughout of TCP as a function of window size and RTT ignore slow start (typically very short phases)
let W be the window size when loss occurs when window is W throughput is WRTT just after loss window drops to W2 throughput to
W2RTT Throughput increases linearly (by MSSRTT every
RTT) Average Throughput 75 WRTT
(Based on Idealised model for the steady-state dynamics of TCP)
Transport Layer 3-50
TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired
throughput of 10 Gbps requires window size W = 83333 in-flight
segments Throughput in terms of loss rate
L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)
new versions of TCP is needed for high-speed environments
LRTT
MSS221
Transport Layer ndash TCP
51
BB
TCP FairnessTCP Fairness
Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth
TCP connection 1
bottleneckrouter
capacity RR
TCP connection 2
Go to Summary of TCP Congestion Control
Transport Layer ndash TCP
52
BB
Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link
Link with transmission rate of RR
Each connection have the same MSSMSS RTTRTT
No other TCP connections or UDP datagrams traverse the shared link
Ignore slow start phase of TCP
Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)
Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing
AssumptionsAssumptions
Transport Layer ndash TCP
53
BB
Why is TCP fairWhy is TCP fair
Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput
proportionally
RR
equal bandwidth shareequal bandwidth share
Connection 1 throughput
Connect
ion 2
thro
ughput
congestion avoidance additive increaseloss decrease window by factor of 2
RR
A point on the graph depicts the amount of link bandwidth
jointly jointly consumedconsumed by
the connections
Full bandwidth utilisation line
We can viewWe can view a simulation a simulation
on thison this
View SimulationView Simulation
Transport Layer ndash TCP
54
BB
The End
The next succeeding slides are just for additional reading
Transport Layer ndash TCP
55
BB
TCP Latency ModelingTCP Latency Modeling
In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs
Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth
Multiple End Systems sharing a link
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
Loop holes in TCPLoop holes in TCP
1 TCP connection
1 TCP connection
1 TCP connection
3 TCP connections
Multithreading implementationMultithreading implementation
Transport Layer ndash TCP
56
BB
TCP latency modelingTCP latency modeling
TCP connection establishment time data transfer delay Actual data transmission time
Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent
QQ How long How long does it take to does it take to
receive an object receive an object from a Web from a Web
serverserver
No data transfer delayNo data transfer delay
Therersquos data transfer delayTherersquos data transfer delay
the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety
Transport Layer ndash TCP
57
BB
TCP Latency ModelingTCP Latency Modeling
Network is uncongested with one link between end systems of rate RR
CongWinCongWin (fixed) determines the amount of data that can be sent
No packet loss no packet corruption no retransmissions required
Header overheads are negligible
File to sendFile to send = integer number of segments of size MSS
Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times
CLIENT SERVER
FILEFILE
Initial ThresholdThreshold of TCP congestion mechanism is very big
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits
SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)
Transport Layer ndash TCP
58
BB
TCP latency ModelingTCP latency Modeling
Case 1 latency = 2RTT + OR
K = Number of Windows of data that cover the object
K = OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
Number of segmentsRounded up to the nearest integer
Assume W=4 segmentsAssume W=4 segments
eg O=256bits S=32bits W=4
Transport Layer ndash TCP
59
BB
TCP latency ModelingTCP latency Modeling
Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]
Number of Windows of data that cover the objectK= OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR
Sender has to wait for an ACK after a windowrsquos worth of data sent
STALLED STALLED PERIODPERIOD
If there are k windows sender willbe stalled (k-1) times
Transport Layer ndash TCP
60
BB
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
STALLED STALLED PERIODPERIOD
4 windows4 windows
OS=15
Transport Layer ndash TCP
61
BB
bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the
object as followsobject as follows
S
OkK k 110 222min
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1log
1logmin
12min
2
2
S
OK
S
OkkK
S
OkK k
Note
Transport Layer ndash TCP
62
BB
bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window
bull Transmission of Transmission of kkth window window ==
bull Stall Time Stall Time ==
bull Latency Latency ==
12
k
R
S
12
k
R
SRTT
R
S
RTTR
S
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1K
1
122k
k
R
SRTT
R
S
R
ORTT
Transport Layer ndash TCP
63
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
Transport Layer ndash TCP
64
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
R
S
R
SRTTP
R
ORTT P )12(2Latency
bull Closed-form expression for the latency
Transport Layer ndash TCP
65
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
2
1
RTTRO
P
encyMinimumLat
Latency
Slow start will not significantly increase latency if RTT ltlt OR
Transport Layer ndash TCP
66
BB
httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features
Transport Layer ndash TCP
8
BB
ExampleExample
Suppose that a process in Host A wants to send a stream of data to a process in Host B over a TCP connection
AssumeData stream file consisting of 500000 bytes500000 bytesMSS 1000 bytes1000 bytesFirst byte of data stream numbered as 00
TCP constructs 500 segments out of the data stream
500000 bytes1000 bytes = 500 segments
Transport Layer ndash TCP
9
BB
TCP sequence s and ACKs
Sequence Numbers (s) byte stream number of first byte in segments data Do not necessarily start from 0 use random initial number
RRbull Segment 1 0 + RRbull Segment 2 1000 + RR etc
ACKs (acknowledgment) Seq of next byte expected from other side (last byte +1) Cumulative ACK If received segment 1 waits for segment 2 Eg Ack=1000 + R (received up to 999th byte)
00 1 2 3 4 999 10001000 1001 10021999
Segment 1 Segment 2
Transport Layer ndash TCP
10
BB
TCP sequence s and ACKs
Q how receiver handles out-of-order segments A TCP specs does does
notnot say - decide when implementing
Host A Host BSeq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
Usertypes
C
host ACKsreceipt
of echoedC
host ACKsreceipt of
C echoesback C
time
simple telnet scenario (with echo on)
Assuming that the starting sequence numbers for Host A
and Host B are 4242 and 7979 respectively
Send me the bytes from 43 onwardACK is being piggy-backed onserver-to-client data
Irsquom sending data starting at seq num=42
serverclient
Transport Layer ndash TCP
11
BB
Yet another server echo Yet another server echo exampleexample
Host A Host B
Seq=42 ACK=79 data = lsquoHellorsquo
Seq=79 ACK=47 data = lsquoHellorsquo
Seq=47 ACK=84 data = lsquo200rsquo
UsertypesHello
host ACKsreceipt
of echoedHello
send something else
host ACKsreceipt ofHello
echoes back Hello
time
Host Aseq=42ack=79
seq=47ack=84
Host B
seq=79
ack=47
seq=84
ack=50
Seq=84 ACK=50 data = lsquo200rsquo
ACK tells about up to what byte ACK tells about up to what byte has been receivedhas been received and what is the and what is the
next startingnext starting byte the host is byte the host is expecting to receiveexpecting to receive
Transport Layer ndash TCP
12
BB
TCP Round Trip Time and Timeout
Q how to set TCP how to set TCP timeout valuetimeout value
longer than RTT note RTT will vary
too short premature timeout unnecessary
retransmissions too long slow
reaction to segment loss
RTT = round trip time
Q how to estimate RTThow to estimate RTT SampleRTT measured time
from segment transmission until ACK receipt ignore retransmissions
cumulatively ACKed segments
SampleRTT will vary we would want estimated RTT to be smoother use several recent use several recent
measurementsmeasurements not just current SampleRTT
Main IssueMain Issue How long is the sender willing to wait before re-transmitting the packet
Transport Layer ndash TCP
13
BB
TCP Round Trip Time and Timeout
EstimatedRTT = (1-x) EstimatedRTT + x SampleRTTEstimatedRTT = (1-x) EstimatedRTT + x SampleRTT
Exponential weighted moving average influence of given sample decreases exponentially
fast typical value of x 0125 (RFC 2988)
Setting the timeoutSetting the timeout EstimatedRTT plus safety marginsafety margin large variation in EstimatedRTT -gt larger safety margin recommended value of x 025x 025
Timeout = EstimatedRTT + (4 Deviation)Timeout = EstimatedRTT + (4 Deviation)
DeviationDeviation = (1-x) Deviation + x |SampleRTT-EstimatedRTT|
Transport Layer ndash TCP
14
BB
EstimatedRTT = 0875 EstimatedRTT + 0125 SampleRTT
EstimatedRTT after the receipt of the ACK of segment 1EstimatedRTT = RTT for Segment 1 = 002746 second
EstimatedRTT after the receipt of the ACK of segment 2EstimatedRTT = 0875 002746 + 0125 0035557 = 00285
EstimatedRTT after the receipt of the ACK of segment 3EstimatedRTT = 0875 00285 + 0125 0070059 = 00337
EstimatedRTT after the receipt of the ACK of segment 4EstimatedRTT = 0875 00337+ 0125 011443 = 00438
EstimatedRTT after the receipt of the ACK of segment 5EstimatedRTT = 0875 00438 + 0125 013989 = 00558
EstimatedRTT after the receipt of the ACK of segment 6EstimatedRTT = 0875 00558 + 0125 018964 = 00725
Sample CalculationsSample Calculations
Transport Layer ndash TCP
15
BB
RTT Samples and RTT estimatesRTT Samples and RTT estimates
300
250
200
150
100 time
Estimated RTT
Sample RTT
RT
T (
mse
c)
The variations in the The variations in the SampleRTT SampleRTT are are smoothed out in the computation of thesmoothed out in the computation of the
EstimatedRTTEstimatedRTT
Transport Layer ndash TCP
16
BB
An Actual RTT estimationAn Actual RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RT
T (m
illis
eco
nds)
SampleRTT Estimated RTT
Transport Layer ndash TCP
17
BB
Simplified TCP sender assuming
waitfor
event
waitfor
event
event data received from application above
event timer timeout for segment with seq number y
event ACK receivedwith ACK number y
create send segment
retransmit segment
process ACK
- one way data transfer- no flow congestion control
FSM of TCP for Reliable Data Transfer FSM of TCP for Reliable Data Transfer
Transport Layer ndash TCP
18
BB
00 sendbase = initial_sequence number 01 nextseqnum = initial_sequence number 0203 loop (forever) 04 switch(event) 05 event data received from application above 06 create TCP segment with sequence number nextseqnum 07 If (timer is currently not running) start timer for segment
nextseqnum 08 pass segment to IP 09 nextseqnum = nextseqnum + length(data) 10 event timer timeout 11 retransmit not-yet-ACKed segment with smallest Seq 12 Start timer13 event ACK received with ACK field value of y 15 if (y gt sendbase) cumulative ACK of all data up to y 16 sendbase = y 17 If (there are currently any not-yet-ACKed segments)18 start timer19 20 end of loop forever
Associated with the oldest unACKed
segment
SIMPLIFIED TCPSIMPLIFIED TCP SENDER
AssumptionsAssumptions bull sender is not constrained by TCP flow or congestion controlbull that data from above is less than MSS in sizebull that data transfer is in one direction only
Transport Layer ndash TCP
20
BB
00 sendbase = initial_sequence number 01 nextseqnum = initial_sequence number 0203 loop (forever) 04 switch(event) 05 event data received from application above 06 create TCP segment with sequence number nextseqnum 07 start timer for segment nextseqnum 08 pass segment to IP 09 nextseqnum = nextseqnum + length(data) 10 event timer timeout for segment with sequence number y 11 retransmit segment with sequence number y 12 compute new timeout interval for segment y 13 restart timer for sequence number y 14 event ACK received with ACK field value of y 15 if (y gt sendbase) cumulative ACK of all data up to y 16 cancel all timers for segments with sequence numbers lt y 17 sendbase = y 18 19 else a duplicate ACK for already ACKed segment 20 increment number of duplicate ACKs received for y 21 if (number of duplicate ACKS received for y is 3) 22 perform TCP fast retransmission 23 resend segment with sequence number y 24 restart timer for segment y 25 26 end of loop forever
TCPTCP with MODIFICATIONSwith MODIFICATIONS SENDER
Why wait for the timeout to expire when consecutive
ACKs can be used to indicate a lost segment
With Fast Retransmit
Transport Layer ndash TCP
21
BB
TCP ACK generation [RFC 1122 RFC 2581]
Event
in-order segment arrival no gapseverything else already ACKed
in-order segment arrival no gaps one delayed ACK pending (due to action 1)
out-of-order segment arrivalwith higher than expect seq - a gap is detected
arrival of segment that partially or completely fills gap
TCP Receiver action
Delay sending the ACK Wait up to 500msfor next segment If next segment does not arrive in this interval send ACK
immediately send a singlecumulative ACK
send duplicate ACK indicating seq of next expected byte
Immediately send an ACK if segment startsat lower end of gap
1
2
3
4
Receiver does not discard out-of-order segmentsReceiver does not discard out-of-order segments
Transport Layer ndash TCP
22
BB
TCP Interesting Scenarios
Host A
Seq=92 8 bytes data
ACK=100
loss
tim
eout
time lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
Host A
Seq=100 20 bytes data
ACK=100
Seq=
92
tim
eout
time premature timeoutcumulative ACKs
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
ACK=120
Retransmission due to lost ACKSegment with Seq=100 not retransmitted
Timer is restarted here for Seq=92
Simplified TCP versionSimplified TCP version
Transport Layer ndash TCP
23
BB
Host A
Seq=100 20 bytes data
ACK=100Seq=
92
tim
eout
time
Host B
ACK=120
Seq=92 8 bytes data
Xloss
Cumulative ACK avoids retransmission of the first segment
TCP Retransmission ScenarioTCP Retransmission Scenario
Transport Layer ndash TCP
24
BB
TCP Modifications Doubling the Timeout IntervalDoubling the Timeout Interval
Provides a limited form of congestion control
Timer expiration is more likely caused by congestion in the network
TimeoutInterval = 2 TimeoutIntervalPrevious
TCP acts more politely by increasing the TimeoutInterval causing the sender to retransmit after longer and longer intervals
Congestion may get worse if sources
continue to retransmit packets
persistently
After ACK is received After ACK is received TimeoutInterval is derived TimeoutInterval is derived from most recent EstimatedRTT from most recent EstimatedRTT and DevRTTand DevRTT
Others check RFC 2018 ndash selective ACK
Transport Layer ndash TCP
25
BB
receiverreceiver explicitly informs sender of (dynamically changing) amount of free buffer space RcvWindowRcvWindow field
in TCP segment
sendersender keeps the amount of transmitted unACKed data less thanless than most recently received RcvWindowRcvWindow
sender wont overrun
receivers buffer bytransmitting too
much too fast
flow controlflow control
receiver buffering
RcvBufferRcvBuffer = size of TCP Receive Buffer
RcvWindowRcvWindow = amount of spare room in Buffer
TCP Flow ControlTCP Flow Control
Transport Layer ndash TCP
26
BB
FLOW CONTROL FLOW CONTROL ReceiverReceiver
04060 50100
LastByteRcvd
EXAMPLE HOST A sends a large file to HOST B
LastByteRead
Application Process
Data from IP
RcvBuffer
RcvWindow=RcvBuffer-[LastByteRcvd-LastByteRead]
RECEIVER HOST B ndash uses RcvWindow LastByteRcvd LastByteReadRcvWindow LastByteRcvd LastByteRead
Initially RcvWindow = RcvBuffer Application reads from the bufferHOST BHOST B tells HOST AHOST A how much spare roomhow much spare room it has in the connection buffer by placing its current value of RcvWindowRcvWindow in the receive window field of every segment it sends to HOST AHOST A
Transport Layer ndash TCP
27
BB
FLOW CONTROL SenderSender
04060 50100
LastByteSent
EXAMPLE HOST A sends a large file to HOST B
LastByteACKed
Data
To ensure that HOST B does not overflow HOST A maintains throughout the connectionrsquos life that [LastByteSent-LastByteACKed] lt= RcvWindow
SENDER HOST A
ACKs ACKs from from Host Host BB
SENDER HOST A ndash uses RcvWindow RcvWindow of HostBof HostB LastByteSent LastByteACKed LastByteSent LastByteACKed
Transport Layer ndash TCP
28
BB
FLOW CONTROLFLOW CONTROLSome issue to consider
RcvWindowRcvWindow ndash used by the connection to provide the flow control serviceflow control service
TCPTCP requires that requires that HOST AHOST A continue to sendcontinue to send segments with one segments with one data byte when HOST Brsquos receive data byte when HOST Brsquos receive window is window is 00 Such segments will be Such segments will be ACKedACKed by HOST B by HOST B Eventually the buffer will have Eventually the buffer will have some space and the some space and the ACKsACKs will will contain contain RcvWindow gt 0RcvWindow gt 0
What happens when the receive
buffer of HOST B is full full (that is when (that is when
RcvWindow=0)RcvWindow=0)
TCP sends a segment only when there is datadata or ACKACK to send Therefore the sender must maintain the connection lsquoalivealiversquo
Transport Layer ndash TCP
29
BB
TCP Connection ManagementTCP Connection Management
Recall TCP sender receiver establish ldquoconnectionconnectionrdquo before exchanging data segments
Initialize TCP variables sequence numbers buffers flow control info (eg RcvWindow)
ClientClient is the connection initiator
In Java Socket clientSocket = new Socket(hostnameport number) connect ServerServer is contacted by client
In JavaSocket accept()
if (connectconnect(s (struct sockaddr )ampsin sizeof(sin)) = 0) printf(connect failedn) WSACleanup() exit(1)
ns = acceptaccept(s(struct sockaddr )(ampremoteaddr)ampaddrlen)
Transport Layer ndash TCP
30
BB
TCP Connection Management
Three way handshakeStep 1 clientclient end system sends TCP
SYNSYN control segment to server (executed by TCP itself)
specifies initial seq number (isn)
Step 2 serverserver end system receives SYNSYN replies with SYNACKSYNACK control segment
ACKs received SYN allocates buffers specifies serverrsquos initial seq
number
Step 3 clientclient ACKsACKs the connection with
ACK=server_isn +1ACK=server_isn +1
allocates buffers sends SYN=0SYN=0
Connection established
Client
AcceptAccept (SYN=1
seq=server_isnack=client_isn+1)
time
Server
ConnectConnect (SYN=1 seq=client_isn)
ACK (SYN=0
seq=client_isn+1ack=server_isn+1)
Establishing a connection
This is what happens when we create a socket for
connection to a server
After establishing the connection the client can receive segments with app-generated data (SYN=0)(SYN=0)
Transport Layer ndash TCP
31
BB
TCP Connection Management (cont)
Closing a connection
client closes socket
closesocket(s)closesocket(s)
Java clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
How TCP connection is established and torn down
Transport Layer ndash TCP
32
BB
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
Enters timed wait - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer ndash TCP
33
BB
TCP Connection Management (cont)
TCP client lifecycle
TCP server lifecycle
Used in case ACK gets lost It is implementation-dependent (eg 30
seconds 1 minute 2 minutes
Connection formally closes ndash all resources (eg port numbers) are
released
1
2
3
4
5
6
7
8
9
10
11
12
Transport Layer ndash TCP
34
BB
End of Flow Control and Error Control
Transport Layer ndash TCP
35
BB
Flow Control vs Congestion ControlFlow Control vs Congestion ControlSimilar actions are taken but for very different reasons
Flow ControlFlow Controlbull point-to-point traffic between sender and receiverbull speed matching service matching the rate at which the sender is sending against the rate at which the receiving application is readingbull prevents Receiver Buffer from overflowing
CongestionCongestion ndash happens when there are too many sources attempting to send data at too high a rate for the routers along the path
Congestion ControlCongestion Controlbull service that makes sure that the routers between End Systems are able to carry the offered trafficbull prevents routers from overflowing
Same course of action Throttling of the sender
Transport Layer ndash TCP
36
BB
CongestionCongestion Informally too many sources sending too
much data too fast for network to handle different from flow control Manifestations
lost packets (buffer overflow at routers) long delays (queuing in router buffers)
a top-10 problem
Principles of Congestion ControlPrinciples of Congestion Control
Transport Layer ndash TCP
37
BB
Approaches towards congestion control
End-to-end congestion control
no explicit feedback from network
congestion inferred by end-systems from observed packet loss amp delay
approach taken by TCP
Network-assisted congestion control
routers provide feedback to End Systems in the form of single bit indicating
link congestion (SNA DECbit TCPIP ECN ATM ABR)
explicit transmission rate the sender should send at
1 2
Two broad approaches towards congestion control
Transport Layer ndash TCP
38
BB
TCP Congestion ControlTCP Congestion ControlHow TCP sender limits the rate at which it sends traffic into its connection
SENDER
(Amount of unACKed data)SENDER lt min(CongWin RcvWindow)
LastByteSent - LastByteACKed
Indirectly limits the senderrsquos send rateAssumptions
bull TCP receive buffer is very large ndash no RcvWindow constraint Amt of unACKed data at sender is solely limited by CongWin
bull Packet loss delay amp packet transmission delay are negligible
Sending rate (approx) CongWinRTT
By adjusting CongWin sender can therefore adjust the
rate at which it sends data into its connection
New variable ndash Congestion
Window
Transport Layer ndash TCP
39
BB
TCP Congestion ControlTCP Congestion ControlTCP uses ACKs to trigger (ldquoclockrdquo) its increase in congestion window size ndash ldquoself-clockingrdquo
Arrival of ACKs ndash indication to the sender that all is well
1 Slow Rate
bull Congestion window will be increased at a relatively slow rate
2 High rate
bull Congestion window will be increased more quickly
Transport Layer ndash TCP
40
BB
TCP Congestion ControlTCP Congestion ControlHow TCP perceives that there is congestion on the path
ldquoLoss Eventrdquo ndash when there is excessive congestion router buffers along the path overflows causing datagrams to be dropped which in turn results in a ldquoloss eventrdquo at the sender
1 Timeout
bull no ACK is received after segment loss
2 Receipt of three duplicate ACKs
bull segment loss is followed by three ACKs received at the sender
Transport Layer ndash TCP
41
BB
TCP Congestion Control details
sender limits transmission LastByteSent-LastByteAcked
cwnd
roughly
cwnd is dynamic function of perceived network congestion
How does sender perceive congestion
loss event = timeout or 3 duplicate acks
TCP sender reduces rate (cwnd) after loss event
Three mechanisms1 AIMD
2 slow start
3 conservative after timeout events
rate = cwnd
RTT Bytessec
Transport Layer ndash TCP
42
BB
TCP congestion avoidance congestion avoidance additive increase multiplicative decreaseadditive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every RTT until
loss is detected multiplicative decrease cut cwnd in half after loss
timecwnd
co
nge
stio
n w
indo
w s
ize
saw toothbehavior probingfor bandwidth
Transport Layer ndash TCP
43
BB
TCP Slow Start Slow Start when connection begins
increase rate exponentially until first loss event initially cwndcwnd = 1 MSS double cwndcwnd every RTT done by incrementing cwndcwnd by 1 MSS by 1 MSS for every ACK received
summary initial rate is slow but ramps up exponentially fast (doubling of the sending rate every RTT)
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transport Layer ndash TCP
44
BB
Refinement inferring lossRefinement inferring loss
after 3 dup ACKsafter 3 dup ACKs cwndcwnd is cut in halfhalf window then grows
linearlylinearly butbut after timeout after timeout
eventevent cwndcwnd is is set to 1 MSS1 MSS window then grows
exponentiallyexponentially Up to a thresholdUp to a threshold
then grows linearlylinearly
3 dup ACKs 3 dup ACKs indicates network capable of delivering some segments
timeouttimeout indicates a ldquomore alarmingrdquo congestion scenario
Philosophy
Transport Layer ndash TCP
45
BB
Refinement
Q when should the exponential increase switch to linear
A when cwndcwnd gets to 1212 of its value before timeout
Implementation variable ssthresh (slow-start threshold)ssthresh (slow-start threshold) on loss eventon loss event ssthreshssthresh is set to 1212 of cwndcwnd just
before loss event
Transport Layer ndash TCP
46
BB
TCP Sender Congestion ControlSTATE EVENT TCP SENDER Congestion-
Control ActionCommentary
SLOW START (SS)
ACK receipt for previously unACKed data
CongWin = CongWin + MSSIf(CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo
Resulting in a doubling of CongWin every RTT
Congestion Avoidance (CA)
ACK receipt for previously unACKed data
CongWin = CongWin + MSS (MSSCongWin)
Additive increase resulting in increasing of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate ACK
Threshold = CongWin 2CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
SS or CA Timeout Threshold = CongWin 2CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter Slow Start
SS or CA Duplicate ACK Increment duplicate ACK count for segment being ACKed
CongWin and Threshold not changed
Transport Layer ndash TCP
47
BB
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3 MSSretransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++
duplicate ACK
cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
Transport Layer ndash TCP
48
BB
Congestion control
TCPrsquos Congestion Control Service
CLIENTSERVER
Problem Gridlock sets-in when there is packet loss due to router congestion
forces the End Systems to decrease the rate at which packets are sentdecrease the rate at which packets are sent during periods of congestion
The sending systemrsquos packet is lost due to congestion and is alerted when it stops receiving ACKs of packets sent
Transport Layer ndash TCP
49
BBTransport Layer 3-49
Macroscopic Description of TCP throughput
whatrsquos the average throughout of TCP as a function of window size and RTT ignore slow start (typically very short phases)
let W be the window size when loss occurs when window is W throughput is WRTT just after loss window drops to W2 throughput to
W2RTT Throughput increases linearly (by MSSRTT every
RTT) Average Throughput 75 WRTT
(Based on Idealised model for the steady-state dynamics of TCP)
Transport Layer 3-50
TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired
throughput of 10 Gbps requires window size W = 83333 in-flight
segments Throughput in terms of loss rate
L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)
new versions of TCP is needed for high-speed environments
LRTT
MSS221
Transport Layer ndash TCP
51
BB
TCP FairnessTCP Fairness
Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth
TCP connection 1
bottleneckrouter
capacity RR
TCP connection 2
Go to Summary of TCP Congestion Control
Transport Layer ndash TCP
52
BB
Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link
Link with transmission rate of RR
Each connection have the same MSSMSS RTTRTT
No other TCP connections or UDP datagrams traverse the shared link
Ignore slow start phase of TCP
Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)
Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing
AssumptionsAssumptions
Transport Layer ndash TCP
53
BB
Why is TCP fairWhy is TCP fair
Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput
proportionally
RR
equal bandwidth shareequal bandwidth share
Connection 1 throughput
Connect
ion 2
thro
ughput
congestion avoidance additive increaseloss decrease window by factor of 2
RR
A point on the graph depicts the amount of link bandwidth
jointly jointly consumedconsumed by
the connections
Full bandwidth utilisation line
We can viewWe can view a simulation a simulation
on thison this
View SimulationView Simulation
Transport Layer ndash TCP
54
BB
The End
The next succeeding slides are just for additional reading
Transport Layer ndash TCP
55
BB
TCP Latency ModelingTCP Latency Modeling
In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs
Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth
Multiple End Systems sharing a link
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
Loop holes in TCPLoop holes in TCP
1 TCP connection
1 TCP connection
1 TCP connection
3 TCP connections
Multithreading implementationMultithreading implementation
Transport Layer ndash TCP
56
BB
TCP latency modelingTCP latency modeling
TCP connection establishment time data transfer delay Actual data transmission time
Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent
QQ How long How long does it take to does it take to
receive an object receive an object from a Web from a Web
serverserver
No data transfer delayNo data transfer delay
Therersquos data transfer delayTherersquos data transfer delay
the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety
Transport Layer ndash TCP
57
BB
TCP Latency ModelingTCP Latency Modeling
Network is uncongested with one link between end systems of rate RR
CongWinCongWin (fixed) determines the amount of data that can be sent
No packet loss no packet corruption no retransmissions required
Header overheads are negligible
File to sendFile to send = integer number of segments of size MSS
Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times
CLIENT SERVER
FILEFILE
Initial ThresholdThreshold of TCP congestion mechanism is very big
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits
SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)
Transport Layer ndash TCP
58
BB
TCP latency ModelingTCP latency Modeling
Case 1 latency = 2RTT + OR
K = Number of Windows of data that cover the object
K = OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
Number of segmentsRounded up to the nearest integer
Assume W=4 segmentsAssume W=4 segments
eg O=256bits S=32bits W=4
Transport Layer ndash TCP
59
BB
TCP latency ModelingTCP latency Modeling
Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]
Number of Windows of data that cover the objectK= OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR
Sender has to wait for an ACK after a windowrsquos worth of data sent
STALLED STALLED PERIODPERIOD
If there are k windows sender willbe stalled (k-1) times
Transport Layer ndash TCP
60
BB
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
STALLED STALLED PERIODPERIOD
4 windows4 windows
OS=15
Transport Layer ndash TCP
61
BB
bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the
object as followsobject as follows
S
OkK k 110 222min
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1log
1logmin
12min
2
2
S
OK
S
OkkK
S
OkK k
Note
Transport Layer ndash TCP
62
BB
bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window
bull Transmission of Transmission of kkth window window ==
bull Stall Time Stall Time ==
bull Latency Latency ==
12
k
R
S
12
k
R
SRTT
R
S
RTTR
S
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1K
1
122k
k
R
SRTT
R
S
R
ORTT
Transport Layer ndash TCP
63
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
Transport Layer ndash TCP
64
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
R
S
R
SRTTP
R
ORTT P )12(2Latency
bull Closed-form expression for the latency
Transport Layer ndash TCP
65
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
2
1
RTTRO
P
encyMinimumLat
Latency
Slow start will not significantly increase latency if RTT ltlt OR
Transport Layer ndash TCP
66
BB
httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features
Transport Layer ndash TCP
9
BB
TCP sequence s and ACKs
Sequence Numbers (s) byte stream number of first byte in segments data Do not necessarily start from 0 use random initial number
RRbull Segment 1 0 + RRbull Segment 2 1000 + RR etc
ACKs (acknowledgment) Seq of next byte expected from other side (last byte +1) Cumulative ACK If received segment 1 waits for segment 2 Eg Ack=1000 + R (received up to 999th byte)
00 1 2 3 4 999 10001000 1001 10021999
Segment 1 Segment 2
Transport Layer ndash TCP
10
BB
TCP sequence s and ACKs
Q how receiver handles out-of-order segments A TCP specs does does
notnot say - decide when implementing
Host A Host BSeq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
Usertypes
C
host ACKsreceipt
of echoedC
host ACKsreceipt of
C echoesback C
time
simple telnet scenario (with echo on)
Assuming that the starting sequence numbers for Host A
and Host B are 4242 and 7979 respectively
Send me the bytes from 43 onwardACK is being piggy-backed onserver-to-client data
Irsquom sending data starting at seq num=42
serverclient
Transport Layer ndash TCP
11
BB
Yet another server echo Yet another server echo exampleexample
Host A Host B
Seq=42 ACK=79 data = lsquoHellorsquo
Seq=79 ACK=47 data = lsquoHellorsquo
Seq=47 ACK=84 data = lsquo200rsquo
UsertypesHello
host ACKsreceipt
of echoedHello
send something else
host ACKsreceipt ofHello
echoes back Hello
time
Host Aseq=42ack=79
seq=47ack=84
Host B
seq=79
ack=47
seq=84
ack=50
Seq=84 ACK=50 data = lsquo200rsquo
ACK tells about up to what byte ACK tells about up to what byte has been receivedhas been received and what is the and what is the
next startingnext starting byte the host is byte the host is expecting to receiveexpecting to receive
Transport Layer ndash TCP
12
BB
TCP Round Trip Time and Timeout
Q how to set TCP how to set TCP timeout valuetimeout value
longer than RTT note RTT will vary
too short premature timeout unnecessary
retransmissions too long slow
reaction to segment loss
RTT = round trip time
Q how to estimate RTThow to estimate RTT SampleRTT measured time
from segment transmission until ACK receipt ignore retransmissions
cumulatively ACKed segments
SampleRTT will vary we would want estimated RTT to be smoother use several recent use several recent
measurementsmeasurements not just current SampleRTT
Main IssueMain Issue How long is the sender willing to wait before re-transmitting the packet
Transport Layer ndash TCP
13
BB
TCP Round Trip Time and Timeout
EstimatedRTT = (1-x) EstimatedRTT + x SampleRTTEstimatedRTT = (1-x) EstimatedRTT + x SampleRTT
Exponential weighted moving average influence of given sample decreases exponentially
fast typical value of x 0125 (RFC 2988)
Setting the timeoutSetting the timeout EstimatedRTT plus safety marginsafety margin large variation in EstimatedRTT -gt larger safety margin recommended value of x 025x 025
Timeout = EstimatedRTT + (4 Deviation)Timeout = EstimatedRTT + (4 Deviation)
DeviationDeviation = (1-x) Deviation + x |SampleRTT-EstimatedRTT|
Transport Layer ndash TCP
14
BB
EstimatedRTT = 0875 EstimatedRTT + 0125 SampleRTT
EstimatedRTT after the receipt of the ACK of segment 1EstimatedRTT = RTT for Segment 1 = 002746 second
EstimatedRTT after the receipt of the ACK of segment 2EstimatedRTT = 0875 002746 + 0125 0035557 = 00285
EstimatedRTT after the receipt of the ACK of segment 3EstimatedRTT = 0875 00285 + 0125 0070059 = 00337
EstimatedRTT after the receipt of the ACK of segment 4EstimatedRTT = 0875 00337+ 0125 011443 = 00438
EstimatedRTT after the receipt of the ACK of segment 5EstimatedRTT = 0875 00438 + 0125 013989 = 00558
EstimatedRTT after the receipt of the ACK of segment 6EstimatedRTT = 0875 00558 + 0125 018964 = 00725
Sample CalculationsSample Calculations
Transport Layer ndash TCP
15
BB
RTT Samples and RTT estimatesRTT Samples and RTT estimates
300
250
200
150
100 time
Estimated RTT
Sample RTT
RT
T (
mse
c)
The variations in the The variations in the SampleRTT SampleRTT are are smoothed out in the computation of thesmoothed out in the computation of the
EstimatedRTTEstimatedRTT
Transport Layer ndash TCP
16
BB
An Actual RTT estimationAn Actual RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RT
T (m
illis
eco
nds)
SampleRTT Estimated RTT
Transport Layer ndash TCP
17
BB
Simplified TCP sender assuming
waitfor
event
waitfor
event
event data received from application above
event timer timeout for segment with seq number y
event ACK receivedwith ACK number y
create send segment
retransmit segment
process ACK
- one way data transfer- no flow congestion control
FSM of TCP for Reliable Data Transfer FSM of TCP for Reliable Data Transfer
Transport Layer ndash TCP
18
BB
00 sendbase = initial_sequence number 01 nextseqnum = initial_sequence number 0203 loop (forever) 04 switch(event) 05 event data received from application above 06 create TCP segment with sequence number nextseqnum 07 If (timer is currently not running) start timer for segment
nextseqnum 08 pass segment to IP 09 nextseqnum = nextseqnum + length(data) 10 event timer timeout 11 retransmit not-yet-ACKed segment with smallest Seq 12 Start timer13 event ACK received with ACK field value of y 15 if (y gt sendbase) cumulative ACK of all data up to y 16 sendbase = y 17 If (there are currently any not-yet-ACKed segments)18 start timer19 20 end of loop forever
Associated with the oldest unACKed
segment
SIMPLIFIED TCPSIMPLIFIED TCP SENDER
AssumptionsAssumptions bull sender is not constrained by TCP flow or congestion controlbull that data from above is less than MSS in sizebull that data transfer is in one direction only
Transport Layer ndash TCP
20
BB
00 sendbase = initial_sequence number 01 nextseqnum = initial_sequence number 0203 loop (forever) 04 switch(event) 05 event data received from application above 06 create TCP segment with sequence number nextseqnum 07 start timer for segment nextseqnum 08 pass segment to IP 09 nextseqnum = nextseqnum + length(data) 10 event timer timeout for segment with sequence number y 11 retransmit segment with sequence number y 12 compute new timeout interval for segment y 13 restart timer for sequence number y 14 event ACK received with ACK field value of y 15 if (y gt sendbase) cumulative ACK of all data up to y 16 cancel all timers for segments with sequence numbers lt y 17 sendbase = y 18 19 else a duplicate ACK for already ACKed segment 20 increment number of duplicate ACKs received for y 21 if (number of duplicate ACKS received for y is 3) 22 perform TCP fast retransmission 23 resend segment with sequence number y 24 restart timer for segment y 25 26 end of loop forever
TCPTCP with MODIFICATIONSwith MODIFICATIONS SENDER
Why wait for the timeout to expire when consecutive
ACKs can be used to indicate a lost segment
With Fast Retransmit
Transport Layer ndash TCP
21
BB
TCP ACK generation [RFC 1122 RFC 2581]
Event
in-order segment arrival no gapseverything else already ACKed
in-order segment arrival no gaps one delayed ACK pending (due to action 1)
out-of-order segment arrivalwith higher than expect seq - a gap is detected
arrival of segment that partially or completely fills gap
TCP Receiver action
Delay sending the ACK Wait up to 500msfor next segment If next segment does not arrive in this interval send ACK
immediately send a singlecumulative ACK
send duplicate ACK indicating seq of next expected byte
Immediately send an ACK if segment startsat lower end of gap
1
2
3
4
Receiver does not discard out-of-order segmentsReceiver does not discard out-of-order segments
Transport Layer ndash TCP
22
BB
TCP Interesting Scenarios
Host A
Seq=92 8 bytes data
ACK=100
loss
tim
eout
time lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
Host A
Seq=100 20 bytes data
ACK=100
Seq=
92
tim
eout
time premature timeoutcumulative ACKs
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
ACK=120
Retransmission due to lost ACKSegment with Seq=100 not retransmitted
Timer is restarted here for Seq=92
Simplified TCP versionSimplified TCP version
Transport Layer ndash TCP
23
BB
Host A
Seq=100 20 bytes data
ACK=100Seq=
92
tim
eout
time
Host B
ACK=120
Seq=92 8 bytes data
Xloss
Cumulative ACK avoids retransmission of the first segment
TCP Retransmission ScenarioTCP Retransmission Scenario
Transport Layer ndash TCP
24
BB
TCP Modifications Doubling the Timeout IntervalDoubling the Timeout Interval
Provides a limited form of congestion control
Timer expiration is more likely caused by congestion in the network
TimeoutInterval = 2 TimeoutIntervalPrevious
TCP acts more politely by increasing the TimeoutInterval causing the sender to retransmit after longer and longer intervals
Congestion may get worse if sources
continue to retransmit packets
persistently
After ACK is received After ACK is received TimeoutInterval is derived TimeoutInterval is derived from most recent EstimatedRTT from most recent EstimatedRTT and DevRTTand DevRTT
Others check RFC 2018 ndash selective ACK
Transport Layer ndash TCP
25
BB
receiverreceiver explicitly informs sender of (dynamically changing) amount of free buffer space RcvWindowRcvWindow field
in TCP segment
sendersender keeps the amount of transmitted unACKed data less thanless than most recently received RcvWindowRcvWindow
sender wont overrun
receivers buffer bytransmitting too
much too fast
flow controlflow control
receiver buffering
RcvBufferRcvBuffer = size of TCP Receive Buffer
RcvWindowRcvWindow = amount of spare room in Buffer
TCP Flow ControlTCP Flow Control
Transport Layer ndash TCP
26
BB
FLOW CONTROL FLOW CONTROL ReceiverReceiver
04060 50100
LastByteRcvd
EXAMPLE HOST A sends a large file to HOST B
LastByteRead
Application Process
Data from IP
RcvBuffer
RcvWindow=RcvBuffer-[LastByteRcvd-LastByteRead]
RECEIVER HOST B ndash uses RcvWindow LastByteRcvd LastByteReadRcvWindow LastByteRcvd LastByteRead
Initially RcvWindow = RcvBuffer Application reads from the bufferHOST BHOST B tells HOST AHOST A how much spare roomhow much spare room it has in the connection buffer by placing its current value of RcvWindowRcvWindow in the receive window field of every segment it sends to HOST AHOST A
Transport Layer ndash TCP
27
BB
FLOW CONTROL SenderSender
04060 50100
LastByteSent
EXAMPLE HOST A sends a large file to HOST B
LastByteACKed
Data
To ensure that HOST B does not overflow HOST A maintains throughout the connectionrsquos life that [LastByteSent-LastByteACKed] lt= RcvWindow
SENDER HOST A
ACKs ACKs from from Host Host BB
SENDER HOST A ndash uses RcvWindow RcvWindow of HostBof HostB LastByteSent LastByteACKed LastByteSent LastByteACKed
Transport Layer ndash TCP
28
BB
FLOW CONTROLFLOW CONTROLSome issue to consider
RcvWindowRcvWindow ndash used by the connection to provide the flow control serviceflow control service
TCPTCP requires that requires that HOST AHOST A continue to sendcontinue to send segments with one segments with one data byte when HOST Brsquos receive data byte when HOST Brsquos receive window is window is 00 Such segments will be Such segments will be ACKedACKed by HOST B by HOST B Eventually the buffer will have Eventually the buffer will have some space and the some space and the ACKsACKs will will contain contain RcvWindow gt 0RcvWindow gt 0
What happens when the receive
buffer of HOST B is full full (that is when (that is when
RcvWindow=0)RcvWindow=0)
TCP sends a segment only when there is datadata or ACKACK to send Therefore the sender must maintain the connection lsquoalivealiversquo
Transport Layer ndash TCP
29
BB
TCP Connection ManagementTCP Connection Management
Recall TCP sender receiver establish ldquoconnectionconnectionrdquo before exchanging data segments
Initialize TCP variables sequence numbers buffers flow control info (eg RcvWindow)
ClientClient is the connection initiator
In Java Socket clientSocket = new Socket(hostnameport number) connect ServerServer is contacted by client
In JavaSocket accept()
if (connectconnect(s (struct sockaddr )ampsin sizeof(sin)) = 0) printf(connect failedn) WSACleanup() exit(1)
ns = acceptaccept(s(struct sockaddr )(ampremoteaddr)ampaddrlen)
Transport Layer ndash TCP
30
BB
TCP Connection Management
Three way handshakeStep 1 clientclient end system sends TCP
SYNSYN control segment to server (executed by TCP itself)
specifies initial seq number (isn)
Step 2 serverserver end system receives SYNSYN replies with SYNACKSYNACK control segment
ACKs received SYN allocates buffers specifies serverrsquos initial seq
number
Step 3 clientclient ACKsACKs the connection with
ACK=server_isn +1ACK=server_isn +1
allocates buffers sends SYN=0SYN=0
Connection established
Client
AcceptAccept (SYN=1
seq=server_isnack=client_isn+1)
time
Server
ConnectConnect (SYN=1 seq=client_isn)
ACK (SYN=0
seq=client_isn+1ack=server_isn+1)
Establishing a connection
This is what happens when we create a socket for
connection to a server
After establishing the connection the client can receive segments with app-generated data (SYN=0)(SYN=0)
Transport Layer ndash TCP
31
BB
TCP Connection Management (cont)
Closing a connection
client closes socket
closesocket(s)closesocket(s)
Java clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
How TCP connection is established and torn down
Transport Layer ndash TCP
32
BB
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
Enters timed wait - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer ndash TCP
33
BB
TCP Connection Management (cont)
TCP client lifecycle
TCP server lifecycle
Used in case ACK gets lost It is implementation-dependent (eg 30
seconds 1 minute 2 minutes
Connection formally closes ndash all resources (eg port numbers) are
released
1
2
3
4
5
6
7
8
9
10
11
12
Transport Layer ndash TCP
34
BB
End of Flow Control and Error Control
Transport Layer ndash TCP
35
BB
Flow Control vs Congestion ControlFlow Control vs Congestion ControlSimilar actions are taken but for very different reasons
Flow ControlFlow Controlbull point-to-point traffic between sender and receiverbull speed matching service matching the rate at which the sender is sending against the rate at which the receiving application is readingbull prevents Receiver Buffer from overflowing
CongestionCongestion ndash happens when there are too many sources attempting to send data at too high a rate for the routers along the path
Congestion ControlCongestion Controlbull service that makes sure that the routers between End Systems are able to carry the offered trafficbull prevents routers from overflowing
Same course of action Throttling of the sender
Transport Layer ndash TCP
36
BB
CongestionCongestion Informally too many sources sending too
much data too fast for network to handle different from flow control Manifestations
lost packets (buffer overflow at routers) long delays (queuing in router buffers)
a top-10 problem
Principles of Congestion ControlPrinciples of Congestion Control
Transport Layer ndash TCP
37
BB
Approaches towards congestion control
End-to-end congestion control
no explicit feedback from network
congestion inferred by end-systems from observed packet loss amp delay
approach taken by TCP
Network-assisted congestion control
routers provide feedback to End Systems in the form of single bit indicating
link congestion (SNA DECbit TCPIP ECN ATM ABR)
explicit transmission rate the sender should send at
1 2
Two broad approaches towards congestion control
Transport Layer ndash TCP
38
BB
TCP Congestion ControlTCP Congestion ControlHow TCP sender limits the rate at which it sends traffic into its connection
SENDER
(Amount of unACKed data)SENDER lt min(CongWin RcvWindow)
LastByteSent - LastByteACKed
Indirectly limits the senderrsquos send rateAssumptions
bull TCP receive buffer is very large ndash no RcvWindow constraint Amt of unACKed data at sender is solely limited by CongWin
bull Packet loss delay amp packet transmission delay are negligible
Sending rate (approx) CongWinRTT
By adjusting CongWin sender can therefore adjust the
rate at which it sends data into its connection
New variable ndash Congestion
Window
Transport Layer ndash TCP
39
BB
TCP Congestion ControlTCP Congestion ControlTCP uses ACKs to trigger (ldquoclockrdquo) its increase in congestion window size ndash ldquoself-clockingrdquo
Arrival of ACKs ndash indication to the sender that all is well
1 Slow Rate
bull Congestion window will be increased at a relatively slow rate
2 High rate
bull Congestion window will be increased more quickly
Transport Layer ndash TCP
40
BB
TCP Congestion ControlTCP Congestion ControlHow TCP perceives that there is congestion on the path
ldquoLoss Eventrdquo ndash when there is excessive congestion router buffers along the path overflows causing datagrams to be dropped which in turn results in a ldquoloss eventrdquo at the sender
1 Timeout
bull no ACK is received after segment loss
2 Receipt of three duplicate ACKs
bull segment loss is followed by three ACKs received at the sender
Transport Layer ndash TCP
41
BB
TCP Congestion Control details
sender limits transmission LastByteSent-LastByteAcked
cwnd
roughly
cwnd is dynamic function of perceived network congestion
How does sender perceive congestion
loss event = timeout or 3 duplicate acks
TCP sender reduces rate (cwnd) after loss event
Three mechanisms1 AIMD
2 slow start
3 conservative after timeout events
rate = cwnd
RTT Bytessec
Transport Layer ndash TCP
42
BB
TCP congestion avoidance congestion avoidance additive increase multiplicative decreaseadditive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every RTT until
loss is detected multiplicative decrease cut cwnd in half after loss
timecwnd
co
nge
stio
n w
indo
w s
ize
saw toothbehavior probingfor bandwidth
Transport Layer ndash TCP
43
BB
TCP Slow Start Slow Start when connection begins
increase rate exponentially until first loss event initially cwndcwnd = 1 MSS double cwndcwnd every RTT done by incrementing cwndcwnd by 1 MSS by 1 MSS for every ACK received
summary initial rate is slow but ramps up exponentially fast (doubling of the sending rate every RTT)
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transport Layer ndash TCP
44
BB
Refinement inferring lossRefinement inferring loss
after 3 dup ACKsafter 3 dup ACKs cwndcwnd is cut in halfhalf window then grows
linearlylinearly butbut after timeout after timeout
eventevent cwndcwnd is is set to 1 MSS1 MSS window then grows
exponentiallyexponentially Up to a thresholdUp to a threshold
then grows linearlylinearly
3 dup ACKs 3 dup ACKs indicates network capable of delivering some segments
timeouttimeout indicates a ldquomore alarmingrdquo congestion scenario
Philosophy
Transport Layer ndash TCP
45
BB
Refinement
Q when should the exponential increase switch to linear
A when cwndcwnd gets to 1212 of its value before timeout
Implementation variable ssthresh (slow-start threshold)ssthresh (slow-start threshold) on loss eventon loss event ssthreshssthresh is set to 1212 of cwndcwnd just
before loss event
Transport Layer ndash TCP
46
BB
TCP Sender Congestion ControlSTATE EVENT TCP SENDER Congestion-
Control ActionCommentary
SLOW START (SS)
ACK receipt for previously unACKed data
CongWin = CongWin + MSSIf(CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo
Resulting in a doubling of CongWin every RTT
Congestion Avoidance (CA)
ACK receipt for previously unACKed data
CongWin = CongWin + MSS (MSSCongWin)
Additive increase resulting in increasing of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate ACK
Threshold = CongWin 2CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
SS or CA Timeout Threshold = CongWin 2CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter Slow Start
SS or CA Duplicate ACK Increment duplicate ACK count for segment being ACKed
CongWin and Threshold not changed
Transport Layer ndash TCP
47
BB
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3 MSSretransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++
duplicate ACK
cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
Transport Layer ndash TCP
48
BB
Congestion control
TCPrsquos Congestion Control Service
CLIENTSERVER
Problem Gridlock sets-in when there is packet loss due to router congestion
forces the End Systems to decrease the rate at which packets are sentdecrease the rate at which packets are sent during periods of congestion
The sending systemrsquos packet is lost due to congestion and is alerted when it stops receiving ACKs of packets sent
Transport Layer ndash TCP
49
BBTransport Layer 3-49
Macroscopic Description of TCP throughput
whatrsquos the average throughout of TCP as a function of window size and RTT ignore slow start (typically very short phases)
let W be the window size when loss occurs when window is W throughput is WRTT just after loss window drops to W2 throughput to
W2RTT Throughput increases linearly (by MSSRTT every
RTT) Average Throughput 75 WRTT
(Based on Idealised model for the steady-state dynamics of TCP)
Transport Layer 3-50
TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired
throughput of 10 Gbps requires window size W = 83333 in-flight
segments Throughput in terms of loss rate
L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)
new versions of TCP is needed for high-speed environments
LRTT
MSS221
Transport Layer ndash TCP
51
BB
TCP FairnessTCP Fairness
Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth
TCP connection 1
bottleneckrouter
capacity RR
TCP connection 2
Go to Summary of TCP Congestion Control
Transport Layer ndash TCP
52
BB
Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link
Link with transmission rate of RR
Each connection have the same MSSMSS RTTRTT
No other TCP connections or UDP datagrams traverse the shared link
Ignore slow start phase of TCP
Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)
Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing
AssumptionsAssumptions
Transport Layer ndash TCP
53
BB
Why is TCP fairWhy is TCP fair
Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput
proportionally
RR
equal bandwidth shareequal bandwidth share
Connection 1 throughput
Connect
ion 2
thro
ughput
congestion avoidance additive increaseloss decrease window by factor of 2
RR
A point on the graph depicts the amount of link bandwidth
jointly jointly consumedconsumed by
the connections
Full bandwidth utilisation line
We can viewWe can view a simulation a simulation
on thison this
View SimulationView Simulation
Transport Layer ndash TCP
54
BB
The End
The next succeeding slides are just for additional reading
Transport Layer ndash TCP
55
BB
TCP Latency ModelingTCP Latency Modeling
In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs
Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth
Multiple End Systems sharing a link
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
Loop holes in TCPLoop holes in TCP
1 TCP connection
1 TCP connection
1 TCP connection
3 TCP connections
Multithreading implementationMultithreading implementation
Transport Layer ndash TCP
56
BB
TCP latency modelingTCP latency modeling
TCP connection establishment time data transfer delay Actual data transmission time
Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent
QQ How long How long does it take to does it take to
receive an object receive an object from a Web from a Web
serverserver
No data transfer delayNo data transfer delay
Therersquos data transfer delayTherersquos data transfer delay
the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety
Transport Layer ndash TCP
57
BB
TCP Latency ModelingTCP Latency Modeling
Network is uncongested with one link between end systems of rate RR
CongWinCongWin (fixed) determines the amount of data that can be sent
No packet loss no packet corruption no retransmissions required
Header overheads are negligible
File to sendFile to send = integer number of segments of size MSS
Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times
CLIENT SERVER
FILEFILE
Initial ThresholdThreshold of TCP congestion mechanism is very big
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits
SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)
Transport Layer ndash TCP
58
BB
TCP latency ModelingTCP latency Modeling
Case 1 latency = 2RTT + OR
K = Number of Windows of data that cover the object
K = OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
Number of segmentsRounded up to the nearest integer
Assume W=4 segmentsAssume W=4 segments
eg O=256bits S=32bits W=4
Transport Layer ndash TCP
59
BB
TCP latency ModelingTCP latency Modeling
Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]
Number of Windows of data that cover the objectK= OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR
Sender has to wait for an ACK after a windowrsquos worth of data sent
STALLED STALLED PERIODPERIOD
If there are k windows sender willbe stalled (k-1) times
Transport Layer ndash TCP
60
BB
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
STALLED STALLED PERIODPERIOD
4 windows4 windows
OS=15
Transport Layer ndash TCP
61
BB
bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the
object as followsobject as follows
S
OkK k 110 222min
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1log
1logmin
12min
2
2
S
OK
S
OkkK
S
OkK k
Note
Transport Layer ndash TCP
62
BB
bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window
bull Transmission of Transmission of kkth window window ==
bull Stall Time Stall Time ==
bull Latency Latency ==
12
k
R
S
12
k
R
SRTT
R
S
RTTR
S
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1K
1
122k
k
R
SRTT
R
S
R
ORTT
Transport Layer ndash TCP
63
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
Transport Layer ndash TCP
64
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
R
S
R
SRTTP
R
ORTT P )12(2Latency
bull Closed-form expression for the latency
Transport Layer ndash TCP
65
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
2
1
RTTRO
P
encyMinimumLat
Latency
Slow start will not significantly increase latency if RTT ltlt OR
Transport Layer ndash TCP
66
BB
httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features
Transport Layer ndash TCP
10
BB
TCP sequence s and ACKs
Q how receiver handles out-of-order segments A TCP specs does does
notnot say - decide when implementing
Host A Host BSeq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
Usertypes
C
host ACKsreceipt
of echoedC
host ACKsreceipt of
C echoesback C
time
simple telnet scenario (with echo on)
Assuming that the starting sequence numbers for Host A
and Host B are 4242 and 7979 respectively
Send me the bytes from 43 onwardACK is being piggy-backed onserver-to-client data
Irsquom sending data starting at seq num=42
serverclient
Transport Layer ndash TCP
11
BB
Yet another server echo Yet another server echo exampleexample
Host A Host B
Seq=42 ACK=79 data = lsquoHellorsquo
Seq=79 ACK=47 data = lsquoHellorsquo
Seq=47 ACK=84 data = lsquo200rsquo
UsertypesHello
host ACKsreceipt
of echoedHello
send something else
host ACKsreceipt ofHello
echoes back Hello
time
Host Aseq=42ack=79
seq=47ack=84
Host B
seq=79
ack=47
seq=84
ack=50
Seq=84 ACK=50 data = lsquo200rsquo
ACK tells about up to what byte ACK tells about up to what byte has been receivedhas been received and what is the and what is the
next startingnext starting byte the host is byte the host is expecting to receiveexpecting to receive
Transport Layer ndash TCP
12
BB
TCP Round Trip Time and Timeout
Q how to set TCP how to set TCP timeout valuetimeout value
longer than RTT note RTT will vary
too short premature timeout unnecessary
retransmissions too long slow
reaction to segment loss
RTT = round trip time
Q how to estimate RTThow to estimate RTT SampleRTT measured time
from segment transmission until ACK receipt ignore retransmissions
cumulatively ACKed segments
SampleRTT will vary we would want estimated RTT to be smoother use several recent use several recent
measurementsmeasurements not just current SampleRTT
Main IssueMain Issue How long is the sender willing to wait before re-transmitting the packet
Transport Layer ndash TCP
13
BB
TCP Round Trip Time and Timeout
EstimatedRTT = (1-x) EstimatedRTT + x SampleRTTEstimatedRTT = (1-x) EstimatedRTT + x SampleRTT
Exponential weighted moving average influence of given sample decreases exponentially
fast typical value of x 0125 (RFC 2988)
Setting the timeoutSetting the timeout EstimatedRTT plus safety marginsafety margin large variation in EstimatedRTT -gt larger safety margin recommended value of x 025x 025
Timeout = EstimatedRTT + (4 Deviation)Timeout = EstimatedRTT + (4 Deviation)
DeviationDeviation = (1-x) Deviation + x |SampleRTT-EstimatedRTT|
Transport Layer ndash TCP
14
BB
EstimatedRTT = 0875 EstimatedRTT + 0125 SampleRTT
EstimatedRTT after the receipt of the ACK of segment 1EstimatedRTT = RTT for Segment 1 = 002746 second
EstimatedRTT after the receipt of the ACK of segment 2EstimatedRTT = 0875 002746 + 0125 0035557 = 00285
EstimatedRTT after the receipt of the ACK of segment 3EstimatedRTT = 0875 00285 + 0125 0070059 = 00337
EstimatedRTT after the receipt of the ACK of segment 4EstimatedRTT = 0875 00337+ 0125 011443 = 00438
EstimatedRTT after the receipt of the ACK of segment 5EstimatedRTT = 0875 00438 + 0125 013989 = 00558
EstimatedRTT after the receipt of the ACK of segment 6EstimatedRTT = 0875 00558 + 0125 018964 = 00725
Sample CalculationsSample Calculations
Transport Layer ndash TCP
15
BB
RTT Samples and RTT estimatesRTT Samples and RTT estimates
300
250
200
150
100 time
Estimated RTT
Sample RTT
RT
T (
mse
c)
The variations in the The variations in the SampleRTT SampleRTT are are smoothed out in the computation of thesmoothed out in the computation of the
EstimatedRTTEstimatedRTT
Transport Layer ndash TCP
16
BB
An Actual RTT estimationAn Actual RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RT
T (m
illis
eco
nds)
SampleRTT Estimated RTT
Transport Layer ndash TCP
17
BB
Simplified TCP sender assuming
waitfor
event
waitfor
event
event data received from application above
event timer timeout for segment with seq number y
event ACK receivedwith ACK number y
create send segment
retransmit segment
process ACK
- one way data transfer- no flow congestion control
FSM of TCP for Reliable Data Transfer FSM of TCP for Reliable Data Transfer
Transport Layer ndash TCP
18
BB
00 sendbase = initial_sequence number 01 nextseqnum = initial_sequence number 0203 loop (forever) 04 switch(event) 05 event data received from application above 06 create TCP segment with sequence number nextseqnum 07 If (timer is currently not running) start timer for segment
nextseqnum 08 pass segment to IP 09 nextseqnum = nextseqnum + length(data) 10 event timer timeout 11 retransmit not-yet-ACKed segment with smallest Seq 12 Start timer13 event ACK received with ACK field value of y 15 if (y gt sendbase) cumulative ACK of all data up to y 16 sendbase = y 17 If (there are currently any not-yet-ACKed segments)18 start timer19 20 end of loop forever
Associated with the oldest unACKed
segment
SIMPLIFIED TCPSIMPLIFIED TCP SENDER
AssumptionsAssumptions bull sender is not constrained by TCP flow or congestion controlbull that data from above is less than MSS in sizebull that data transfer is in one direction only
Transport Layer ndash TCP
20
BB
00 sendbase = initial_sequence number 01 nextseqnum = initial_sequence number 0203 loop (forever) 04 switch(event) 05 event data received from application above 06 create TCP segment with sequence number nextseqnum 07 start timer for segment nextseqnum 08 pass segment to IP 09 nextseqnum = nextseqnum + length(data) 10 event timer timeout for segment with sequence number y 11 retransmit segment with sequence number y 12 compute new timeout interval for segment y 13 restart timer for sequence number y 14 event ACK received with ACK field value of y 15 if (y gt sendbase) cumulative ACK of all data up to y 16 cancel all timers for segments with sequence numbers lt y 17 sendbase = y 18 19 else a duplicate ACK for already ACKed segment 20 increment number of duplicate ACKs received for y 21 if (number of duplicate ACKS received for y is 3) 22 perform TCP fast retransmission 23 resend segment with sequence number y 24 restart timer for segment y 25 26 end of loop forever
TCPTCP with MODIFICATIONSwith MODIFICATIONS SENDER
Why wait for the timeout to expire when consecutive
ACKs can be used to indicate a lost segment
With Fast Retransmit
Transport Layer ndash TCP
21
BB
TCP ACK generation [RFC 1122 RFC 2581]
Event
in-order segment arrival no gapseverything else already ACKed
in-order segment arrival no gaps one delayed ACK pending (due to action 1)
out-of-order segment arrivalwith higher than expect seq - a gap is detected
arrival of segment that partially or completely fills gap
TCP Receiver action
Delay sending the ACK Wait up to 500msfor next segment If next segment does not arrive in this interval send ACK
immediately send a singlecumulative ACK
send duplicate ACK indicating seq of next expected byte
Immediately send an ACK if segment startsat lower end of gap
1
2
3
4
Receiver does not discard out-of-order segmentsReceiver does not discard out-of-order segments
Transport Layer ndash TCP
22
BB
TCP Interesting Scenarios
Host A
Seq=92 8 bytes data
ACK=100
loss
tim
eout
time lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
Host A
Seq=100 20 bytes data
ACK=100
Seq=
92
tim
eout
time premature timeoutcumulative ACKs
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
ACK=120
Retransmission due to lost ACKSegment with Seq=100 not retransmitted
Timer is restarted here for Seq=92
Simplified TCP versionSimplified TCP version
Transport Layer ndash TCP
23
BB
Host A
Seq=100 20 bytes data
ACK=100Seq=
92
tim
eout
time
Host B
ACK=120
Seq=92 8 bytes data
Xloss
Cumulative ACK avoids retransmission of the first segment
TCP Retransmission ScenarioTCP Retransmission Scenario
Transport Layer ndash TCP
24
BB
TCP Modifications Doubling the Timeout IntervalDoubling the Timeout Interval
Provides a limited form of congestion control
Timer expiration is more likely caused by congestion in the network
TimeoutInterval = 2 TimeoutIntervalPrevious
TCP acts more politely by increasing the TimeoutInterval causing the sender to retransmit after longer and longer intervals
Congestion may get worse if sources
continue to retransmit packets
persistently
After ACK is received After ACK is received TimeoutInterval is derived TimeoutInterval is derived from most recent EstimatedRTT from most recent EstimatedRTT and DevRTTand DevRTT
Others check RFC 2018 ndash selective ACK
Transport Layer ndash TCP
25
BB
receiverreceiver explicitly informs sender of (dynamically changing) amount of free buffer space RcvWindowRcvWindow field
in TCP segment
sendersender keeps the amount of transmitted unACKed data less thanless than most recently received RcvWindowRcvWindow
sender wont overrun
receivers buffer bytransmitting too
much too fast
flow controlflow control
receiver buffering
RcvBufferRcvBuffer = size of TCP Receive Buffer
RcvWindowRcvWindow = amount of spare room in Buffer
TCP Flow ControlTCP Flow Control
Transport Layer ndash TCP
26
BB
FLOW CONTROL FLOW CONTROL ReceiverReceiver
04060 50100
LastByteRcvd
EXAMPLE HOST A sends a large file to HOST B
LastByteRead
Application Process
Data from IP
RcvBuffer
RcvWindow=RcvBuffer-[LastByteRcvd-LastByteRead]
RECEIVER HOST B ndash uses RcvWindow LastByteRcvd LastByteReadRcvWindow LastByteRcvd LastByteRead
Initially RcvWindow = RcvBuffer Application reads from the bufferHOST BHOST B tells HOST AHOST A how much spare roomhow much spare room it has in the connection buffer by placing its current value of RcvWindowRcvWindow in the receive window field of every segment it sends to HOST AHOST A
Transport Layer ndash TCP
27
BB
FLOW CONTROL SenderSender
04060 50100
LastByteSent
EXAMPLE HOST A sends a large file to HOST B
LastByteACKed
Data
To ensure that HOST B does not overflow HOST A maintains throughout the connectionrsquos life that [LastByteSent-LastByteACKed] lt= RcvWindow
SENDER HOST A
ACKs ACKs from from Host Host BB
SENDER HOST A ndash uses RcvWindow RcvWindow of HostBof HostB LastByteSent LastByteACKed LastByteSent LastByteACKed
Transport Layer ndash TCP
28
BB
FLOW CONTROLFLOW CONTROLSome issue to consider
RcvWindowRcvWindow ndash used by the connection to provide the flow control serviceflow control service
TCPTCP requires that requires that HOST AHOST A continue to sendcontinue to send segments with one segments with one data byte when HOST Brsquos receive data byte when HOST Brsquos receive window is window is 00 Such segments will be Such segments will be ACKedACKed by HOST B by HOST B Eventually the buffer will have Eventually the buffer will have some space and the some space and the ACKsACKs will will contain contain RcvWindow gt 0RcvWindow gt 0
What happens when the receive
buffer of HOST B is full full (that is when (that is when
RcvWindow=0)RcvWindow=0)
TCP sends a segment only when there is datadata or ACKACK to send Therefore the sender must maintain the connection lsquoalivealiversquo
Transport Layer ndash TCP
29
BB
TCP Connection ManagementTCP Connection Management
Recall TCP sender receiver establish ldquoconnectionconnectionrdquo before exchanging data segments
Initialize TCP variables sequence numbers buffers flow control info (eg RcvWindow)
ClientClient is the connection initiator
In Java Socket clientSocket = new Socket(hostnameport number) connect ServerServer is contacted by client
In JavaSocket accept()
if (connectconnect(s (struct sockaddr )ampsin sizeof(sin)) = 0) printf(connect failedn) WSACleanup() exit(1)
ns = acceptaccept(s(struct sockaddr )(ampremoteaddr)ampaddrlen)
Transport Layer ndash TCP
30
BB
TCP Connection Management
Three way handshakeStep 1 clientclient end system sends TCP
SYNSYN control segment to server (executed by TCP itself)
specifies initial seq number (isn)
Step 2 serverserver end system receives SYNSYN replies with SYNACKSYNACK control segment
ACKs received SYN allocates buffers specifies serverrsquos initial seq
number
Step 3 clientclient ACKsACKs the connection with
ACK=server_isn +1ACK=server_isn +1
allocates buffers sends SYN=0SYN=0
Connection established
Client
AcceptAccept (SYN=1
seq=server_isnack=client_isn+1)
time
Server
ConnectConnect (SYN=1 seq=client_isn)
ACK (SYN=0
seq=client_isn+1ack=server_isn+1)
Establishing a connection
This is what happens when we create a socket for
connection to a server
After establishing the connection the client can receive segments with app-generated data (SYN=0)(SYN=0)
Transport Layer ndash TCP
31
BB
TCP Connection Management (cont)
Closing a connection
client closes socket
closesocket(s)closesocket(s)
Java clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
How TCP connection is established and torn down
Transport Layer ndash TCP
32
BB
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
Enters timed wait - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer ndash TCP
33
BB
TCP Connection Management (cont)
TCP client lifecycle
TCP server lifecycle
Used in case ACK gets lost It is implementation-dependent (eg 30
seconds 1 minute 2 minutes
Connection formally closes ndash all resources (eg port numbers) are
released
1
2
3
4
5
6
7
8
9
10
11
12
Transport Layer ndash TCP
34
BB
End of Flow Control and Error Control
Transport Layer ndash TCP
35
BB
Flow Control vs Congestion ControlFlow Control vs Congestion ControlSimilar actions are taken but for very different reasons
Flow ControlFlow Controlbull point-to-point traffic between sender and receiverbull speed matching service matching the rate at which the sender is sending against the rate at which the receiving application is readingbull prevents Receiver Buffer from overflowing
CongestionCongestion ndash happens when there are too many sources attempting to send data at too high a rate for the routers along the path
Congestion ControlCongestion Controlbull service that makes sure that the routers between End Systems are able to carry the offered trafficbull prevents routers from overflowing
Same course of action Throttling of the sender
Transport Layer ndash TCP
36
BB
CongestionCongestion Informally too many sources sending too
much data too fast for network to handle different from flow control Manifestations
lost packets (buffer overflow at routers) long delays (queuing in router buffers)
a top-10 problem
Principles of Congestion ControlPrinciples of Congestion Control
Transport Layer ndash TCP
37
BB
Approaches towards congestion control
End-to-end congestion control
no explicit feedback from network
congestion inferred by end-systems from observed packet loss amp delay
approach taken by TCP
Network-assisted congestion control
routers provide feedback to End Systems in the form of single bit indicating
link congestion (SNA DECbit TCPIP ECN ATM ABR)
explicit transmission rate the sender should send at
1 2
Two broad approaches towards congestion control
Transport Layer ndash TCP
38
BB
TCP Congestion ControlTCP Congestion ControlHow TCP sender limits the rate at which it sends traffic into its connection
SENDER
(Amount of unACKed data)SENDER lt min(CongWin RcvWindow)
LastByteSent - LastByteACKed
Indirectly limits the senderrsquos send rateAssumptions
bull TCP receive buffer is very large ndash no RcvWindow constraint Amt of unACKed data at sender is solely limited by CongWin
bull Packet loss delay amp packet transmission delay are negligible
Sending rate (approx) CongWinRTT
By adjusting CongWin sender can therefore adjust the
rate at which it sends data into its connection
New variable ndash Congestion
Window
Transport Layer ndash TCP
39
BB
TCP Congestion ControlTCP Congestion ControlTCP uses ACKs to trigger (ldquoclockrdquo) its increase in congestion window size ndash ldquoself-clockingrdquo
Arrival of ACKs ndash indication to the sender that all is well
1 Slow Rate
bull Congestion window will be increased at a relatively slow rate
2 High rate
bull Congestion window will be increased more quickly
Transport Layer ndash TCP
40
BB
TCP Congestion ControlTCP Congestion ControlHow TCP perceives that there is congestion on the path
ldquoLoss Eventrdquo ndash when there is excessive congestion router buffers along the path overflows causing datagrams to be dropped which in turn results in a ldquoloss eventrdquo at the sender
1 Timeout
bull no ACK is received after segment loss
2 Receipt of three duplicate ACKs
bull segment loss is followed by three ACKs received at the sender
Transport Layer ndash TCP
41
BB
TCP Congestion Control details
sender limits transmission LastByteSent-LastByteAcked
cwnd
roughly
cwnd is dynamic function of perceived network congestion
How does sender perceive congestion
loss event = timeout or 3 duplicate acks
TCP sender reduces rate (cwnd) after loss event
Three mechanisms1 AIMD
2 slow start
3 conservative after timeout events
rate = cwnd
RTT Bytessec
Transport Layer ndash TCP
42
BB
TCP congestion avoidance congestion avoidance additive increase multiplicative decreaseadditive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every RTT until
loss is detected multiplicative decrease cut cwnd in half after loss
timecwnd
co
nge
stio
n w
indo
w s
ize
saw toothbehavior probingfor bandwidth
Transport Layer ndash TCP
43
BB
TCP Slow Start Slow Start when connection begins
increase rate exponentially until first loss event initially cwndcwnd = 1 MSS double cwndcwnd every RTT done by incrementing cwndcwnd by 1 MSS by 1 MSS for every ACK received
summary initial rate is slow but ramps up exponentially fast (doubling of the sending rate every RTT)
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transport Layer ndash TCP
44
BB
Refinement inferring lossRefinement inferring loss
after 3 dup ACKsafter 3 dup ACKs cwndcwnd is cut in halfhalf window then grows
linearlylinearly butbut after timeout after timeout
eventevent cwndcwnd is is set to 1 MSS1 MSS window then grows
exponentiallyexponentially Up to a thresholdUp to a threshold
then grows linearlylinearly
3 dup ACKs 3 dup ACKs indicates network capable of delivering some segments
timeouttimeout indicates a ldquomore alarmingrdquo congestion scenario
Philosophy
Transport Layer ndash TCP
45
BB
Refinement
Q when should the exponential increase switch to linear
A when cwndcwnd gets to 1212 of its value before timeout
Implementation variable ssthresh (slow-start threshold)ssthresh (slow-start threshold) on loss eventon loss event ssthreshssthresh is set to 1212 of cwndcwnd just
before loss event
Transport Layer ndash TCP
46
BB
TCP Sender Congestion ControlSTATE EVENT TCP SENDER Congestion-
Control ActionCommentary
SLOW START (SS)
ACK receipt for previously unACKed data
CongWin = CongWin + MSSIf(CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo
Resulting in a doubling of CongWin every RTT
Congestion Avoidance (CA)
ACK receipt for previously unACKed data
CongWin = CongWin + MSS (MSSCongWin)
Additive increase resulting in increasing of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate ACK
Threshold = CongWin 2CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
SS or CA Timeout Threshold = CongWin 2CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter Slow Start
SS or CA Duplicate ACK Increment duplicate ACK count for segment being ACKed
CongWin and Threshold not changed
Transport Layer ndash TCP
47
BB
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3 MSSretransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++
duplicate ACK
cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
Transport Layer ndash TCP
48
BB
Congestion control
TCPrsquos Congestion Control Service
CLIENTSERVER
Problem Gridlock sets-in when there is packet loss due to router congestion
forces the End Systems to decrease the rate at which packets are sentdecrease the rate at which packets are sent during periods of congestion
The sending systemrsquos packet is lost due to congestion and is alerted when it stops receiving ACKs of packets sent
Transport Layer ndash TCP
49
BBTransport Layer 3-49
Macroscopic Description of TCP throughput
whatrsquos the average throughout of TCP as a function of window size and RTT ignore slow start (typically very short phases)
let W be the window size when loss occurs when window is W throughput is WRTT just after loss window drops to W2 throughput to
W2RTT Throughput increases linearly (by MSSRTT every
RTT) Average Throughput 75 WRTT
(Based on Idealised model for the steady-state dynamics of TCP)
Transport Layer 3-50
TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired
throughput of 10 Gbps requires window size W = 83333 in-flight
segments Throughput in terms of loss rate
L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)
new versions of TCP is needed for high-speed environments
LRTT
MSS221
Transport Layer ndash TCP
51
BB
TCP FairnessTCP Fairness
Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth
TCP connection 1
bottleneckrouter
capacity RR
TCP connection 2
Go to Summary of TCP Congestion Control
Transport Layer ndash TCP
52
BB
Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link
Link with transmission rate of RR
Each connection have the same MSSMSS RTTRTT
No other TCP connections or UDP datagrams traverse the shared link
Ignore slow start phase of TCP
Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)
Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing
AssumptionsAssumptions
Transport Layer ndash TCP
53
BB
Why is TCP fairWhy is TCP fair
Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput
proportionally
RR
equal bandwidth shareequal bandwidth share
Connection 1 throughput
Connect
ion 2
thro
ughput
congestion avoidance additive increaseloss decrease window by factor of 2
RR
A point on the graph depicts the amount of link bandwidth
jointly jointly consumedconsumed by
the connections
Full bandwidth utilisation line
We can viewWe can view a simulation a simulation
on thison this
View SimulationView Simulation
Transport Layer ndash TCP
54
BB
The End
The next succeeding slides are just for additional reading
Transport Layer ndash TCP
55
BB
TCP Latency ModelingTCP Latency Modeling
In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs
Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth
Multiple End Systems sharing a link
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
Loop holes in TCPLoop holes in TCP
1 TCP connection
1 TCP connection
1 TCP connection
3 TCP connections
Multithreading implementationMultithreading implementation
Transport Layer ndash TCP
56
BB
TCP latency modelingTCP latency modeling
TCP connection establishment time data transfer delay Actual data transmission time
Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent
QQ How long How long does it take to does it take to
receive an object receive an object from a Web from a Web
serverserver
No data transfer delayNo data transfer delay
Therersquos data transfer delayTherersquos data transfer delay
the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety
Transport Layer ndash TCP
57
BB
TCP Latency ModelingTCP Latency Modeling
Network is uncongested with one link between end systems of rate RR
CongWinCongWin (fixed) determines the amount of data that can be sent
No packet loss no packet corruption no retransmissions required
Header overheads are negligible
File to sendFile to send = integer number of segments of size MSS
Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times
CLIENT SERVER
FILEFILE
Initial ThresholdThreshold of TCP congestion mechanism is very big
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits
SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)
Transport Layer ndash TCP
58
BB
TCP latency ModelingTCP latency Modeling
Case 1 latency = 2RTT + OR
K = Number of Windows of data that cover the object
K = OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
Number of segmentsRounded up to the nearest integer
Assume W=4 segmentsAssume W=4 segments
eg O=256bits S=32bits W=4
Transport Layer ndash TCP
59
BB
TCP latency ModelingTCP latency Modeling
Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]
Number of Windows of data that cover the objectK= OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR
Sender has to wait for an ACK after a windowrsquos worth of data sent
STALLED STALLED PERIODPERIOD
If there are k windows sender willbe stalled (k-1) times
Transport Layer ndash TCP
60
BB
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
STALLED STALLED PERIODPERIOD
4 windows4 windows
OS=15
Transport Layer ndash TCP
61
BB
bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the
object as followsobject as follows
S
OkK k 110 222min
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1log
1logmin
12min
2
2
S
OK
S
OkkK
S
OkK k
Note
Transport Layer ndash TCP
62
BB
bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window
bull Transmission of Transmission of kkth window window ==
bull Stall Time Stall Time ==
bull Latency Latency ==
12
k
R
S
12
k
R
SRTT
R
S
RTTR
S
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1K
1
122k
k
R
SRTT
R
S
R
ORTT
Transport Layer ndash TCP
63
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
Transport Layer ndash TCP
64
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
R
S
R
SRTTP
R
ORTT P )12(2Latency
bull Closed-form expression for the latency
Transport Layer ndash TCP
65
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
2
1
RTTRO
P
encyMinimumLat
Latency
Slow start will not significantly increase latency if RTT ltlt OR
Transport Layer ndash TCP
66
BB
httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features
Transport Layer ndash TCP
11
BB
Yet another server echo Yet another server echo exampleexample
Host A Host B
Seq=42 ACK=79 data = lsquoHellorsquo
Seq=79 ACK=47 data = lsquoHellorsquo
Seq=47 ACK=84 data = lsquo200rsquo
UsertypesHello
host ACKsreceipt
of echoedHello
send something else
host ACKsreceipt ofHello
echoes back Hello
time
Host Aseq=42ack=79
seq=47ack=84
Host B
seq=79
ack=47
seq=84
ack=50
Seq=84 ACK=50 data = lsquo200rsquo
ACK tells about up to what byte ACK tells about up to what byte has been receivedhas been received and what is the and what is the
next startingnext starting byte the host is byte the host is expecting to receiveexpecting to receive
Transport Layer ndash TCP
12
BB
TCP Round Trip Time and Timeout
Q how to set TCP how to set TCP timeout valuetimeout value
longer than RTT note RTT will vary
too short premature timeout unnecessary
retransmissions too long slow
reaction to segment loss
RTT = round trip time
Q how to estimate RTThow to estimate RTT SampleRTT measured time
from segment transmission until ACK receipt ignore retransmissions
cumulatively ACKed segments
SampleRTT will vary we would want estimated RTT to be smoother use several recent use several recent
measurementsmeasurements not just current SampleRTT
Main IssueMain Issue How long is the sender willing to wait before re-transmitting the packet
Transport Layer ndash TCP
13
BB
TCP Round Trip Time and Timeout
EstimatedRTT = (1-x) EstimatedRTT + x SampleRTTEstimatedRTT = (1-x) EstimatedRTT + x SampleRTT
Exponential weighted moving average influence of given sample decreases exponentially
fast typical value of x 0125 (RFC 2988)
Setting the timeoutSetting the timeout EstimatedRTT plus safety marginsafety margin large variation in EstimatedRTT -gt larger safety margin recommended value of x 025x 025
Timeout = EstimatedRTT + (4 Deviation)Timeout = EstimatedRTT + (4 Deviation)
DeviationDeviation = (1-x) Deviation + x |SampleRTT-EstimatedRTT|
Transport Layer ndash TCP
14
BB
EstimatedRTT = 0875 EstimatedRTT + 0125 SampleRTT
EstimatedRTT after the receipt of the ACK of segment 1EstimatedRTT = RTT for Segment 1 = 002746 second
EstimatedRTT after the receipt of the ACK of segment 2EstimatedRTT = 0875 002746 + 0125 0035557 = 00285
EstimatedRTT after the receipt of the ACK of segment 3EstimatedRTT = 0875 00285 + 0125 0070059 = 00337
EstimatedRTT after the receipt of the ACK of segment 4EstimatedRTT = 0875 00337+ 0125 011443 = 00438
EstimatedRTT after the receipt of the ACK of segment 5EstimatedRTT = 0875 00438 + 0125 013989 = 00558
EstimatedRTT after the receipt of the ACK of segment 6EstimatedRTT = 0875 00558 + 0125 018964 = 00725
Sample CalculationsSample Calculations
Transport Layer ndash TCP
15
BB
RTT Samples and RTT estimatesRTT Samples and RTT estimates
300
250
200
150
100 time
Estimated RTT
Sample RTT
RT
T (
mse
c)
The variations in the The variations in the SampleRTT SampleRTT are are smoothed out in the computation of thesmoothed out in the computation of the
EstimatedRTTEstimatedRTT
Transport Layer ndash TCP
16
BB
An Actual RTT estimationAn Actual RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RT
T (m
illis
eco
nds)
SampleRTT Estimated RTT
Transport Layer ndash TCP
17
BB
Simplified TCP sender assuming
waitfor
event
waitfor
event
event data received from application above
event timer timeout for segment with seq number y
event ACK receivedwith ACK number y
create send segment
retransmit segment
process ACK
- one way data transfer- no flow congestion control
FSM of TCP for Reliable Data Transfer FSM of TCP for Reliable Data Transfer
Transport Layer ndash TCP
18
BB
00 sendbase = initial_sequence number 01 nextseqnum = initial_sequence number 0203 loop (forever) 04 switch(event) 05 event data received from application above 06 create TCP segment with sequence number nextseqnum 07 If (timer is currently not running) start timer for segment
nextseqnum 08 pass segment to IP 09 nextseqnum = nextseqnum + length(data) 10 event timer timeout 11 retransmit not-yet-ACKed segment with smallest Seq 12 Start timer13 event ACK received with ACK field value of y 15 if (y gt sendbase) cumulative ACK of all data up to y 16 sendbase = y 17 If (there are currently any not-yet-ACKed segments)18 start timer19 20 end of loop forever
Associated with the oldest unACKed
segment
SIMPLIFIED TCPSIMPLIFIED TCP SENDER
AssumptionsAssumptions bull sender is not constrained by TCP flow or congestion controlbull that data from above is less than MSS in sizebull that data transfer is in one direction only
Transport Layer ndash TCP
20
BB
00 sendbase = initial_sequence number 01 nextseqnum = initial_sequence number 0203 loop (forever) 04 switch(event) 05 event data received from application above 06 create TCP segment with sequence number nextseqnum 07 start timer for segment nextseqnum 08 pass segment to IP 09 nextseqnum = nextseqnum + length(data) 10 event timer timeout for segment with sequence number y 11 retransmit segment with sequence number y 12 compute new timeout interval for segment y 13 restart timer for sequence number y 14 event ACK received with ACK field value of y 15 if (y gt sendbase) cumulative ACK of all data up to y 16 cancel all timers for segments with sequence numbers lt y 17 sendbase = y 18 19 else a duplicate ACK for already ACKed segment 20 increment number of duplicate ACKs received for y 21 if (number of duplicate ACKS received for y is 3) 22 perform TCP fast retransmission 23 resend segment with sequence number y 24 restart timer for segment y 25 26 end of loop forever
TCPTCP with MODIFICATIONSwith MODIFICATIONS SENDER
Why wait for the timeout to expire when consecutive
ACKs can be used to indicate a lost segment
With Fast Retransmit
Transport Layer ndash TCP
21
BB
TCP ACK generation [RFC 1122 RFC 2581]
Event
in-order segment arrival no gapseverything else already ACKed
in-order segment arrival no gaps one delayed ACK pending (due to action 1)
out-of-order segment arrivalwith higher than expect seq - a gap is detected
arrival of segment that partially or completely fills gap
TCP Receiver action
Delay sending the ACK Wait up to 500msfor next segment If next segment does not arrive in this interval send ACK
immediately send a singlecumulative ACK
send duplicate ACK indicating seq of next expected byte
Immediately send an ACK if segment startsat lower end of gap
1
2
3
4
Receiver does not discard out-of-order segmentsReceiver does not discard out-of-order segments
Transport Layer ndash TCP
22
BB
TCP Interesting Scenarios
Host A
Seq=92 8 bytes data
ACK=100
loss
tim
eout
time lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
Host A
Seq=100 20 bytes data
ACK=100
Seq=
92
tim
eout
time premature timeoutcumulative ACKs
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
ACK=120
Retransmission due to lost ACKSegment with Seq=100 not retransmitted
Timer is restarted here for Seq=92
Simplified TCP versionSimplified TCP version
Transport Layer ndash TCP
23
BB
Host A
Seq=100 20 bytes data
ACK=100Seq=
92
tim
eout
time
Host B
ACK=120
Seq=92 8 bytes data
Xloss
Cumulative ACK avoids retransmission of the first segment
TCP Retransmission ScenarioTCP Retransmission Scenario
Transport Layer ndash TCP
24
BB
TCP Modifications Doubling the Timeout IntervalDoubling the Timeout Interval
Provides a limited form of congestion control
Timer expiration is more likely caused by congestion in the network
TimeoutInterval = 2 TimeoutIntervalPrevious
TCP acts more politely by increasing the TimeoutInterval causing the sender to retransmit after longer and longer intervals
Congestion may get worse if sources
continue to retransmit packets
persistently
After ACK is received After ACK is received TimeoutInterval is derived TimeoutInterval is derived from most recent EstimatedRTT from most recent EstimatedRTT and DevRTTand DevRTT
Others check RFC 2018 ndash selective ACK
Transport Layer ndash TCP
25
BB
receiverreceiver explicitly informs sender of (dynamically changing) amount of free buffer space RcvWindowRcvWindow field
in TCP segment
sendersender keeps the amount of transmitted unACKed data less thanless than most recently received RcvWindowRcvWindow
sender wont overrun
receivers buffer bytransmitting too
much too fast
flow controlflow control
receiver buffering
RcvBufferRcvBuffer = size of TCP Receive Buffer
RcvWindowRcvWindow = amount of spare room in Buffer
TCP Flow ControlTCP Flow Control
Transport Layer ndash TCP
26
BB
FLOW CONTROL FLOW CONTROL ReceiverReceiver
04060 50100
LastByteRcvd
EXAMPLE HOST A sends a large file to HOST B
LastByteRead
Application Process
Data from IP
RcvBuffer
RcvWindow=RcvBuffer-[LastByteRcvd-LastByteRead]
RECEIVER HOST B ndash uses RcvWindow LastByteRcvd LastByteReadRcvWindow LastByteRcvd LastByteRead
Initially RcvWindow = RcvBuffer Application reads from the bufferHOST BHOST B tells HOST AHOST A how much spare roomhow much spare room it has in the connection buffer by placing its current value of RcvWindowRcvWindow in the receive window field of every segment it sends to HOST AHOST A
Transport Layer ndash TCP
27
BB
FLOW CONTROL SenderSender
04060 50100
LastByteSent
EXAMPLE HOST A sends a large file to HOST B
LastByteACKed
Data
To ensure that HOST B does not overflow HOST A maintains throughout the connectionrsquos life that [LastByteSent-LastByteACKed] lt= RcvWindow
SENDER HOST A
ACKs ACKs from from Host Host BB
SENDER HOST A ndash uses RcvWindow RcvWindow of HostBof HostB LastByteSent LastByteACKed LastByteSent LastByteACKed
Transport Layer ndash TCP
28
BB
FLOW CONTROLFLOW CONTROLSome issue to consider
RcvWindowRcvWindow ndash used by the connection to provide the flow control serviceflow control service
TCPTCP requires that requires that HOST AHOST A continue to sendcontinue to send segments with one segments with one data byte when HOST Brsquos receive data byte when HOST Brsquos receive window is window is 00 Such segments will be Such segments will be ACKedACKed by HOST B by HOST B Eventually the buffer will have Eventually the buffer will have some space and the some space and the ACKsACKs will will contain contain RcvWindow gt 0RcvWindow gt 0
What happens when the receive
buffer of HOST B is full full (that is when (that is when
RcvWindow=0)RcvWindow=0)
TCP sends a segment only when there is datadata or ACKACK to send Therefore the sender must maintain the connection lsquoalivealiversquo
Transport Layer ndash TCP
29
BB
TCP Connection ManagementTCP Connection Management
Recall TCP sender receiver establish ldquoconnectionconnectionrdquo before exchanging data segments
Initialize TCP variables sequence numbers buffers flow control info (eg RcvWindow)
ClientClient is the connection initiator
In Java Socket clientSocket = new Socket(hostnameport number) connect ServerServer is contacted by client
In JavaSocket accept()
if (connectconnect(s (struct sockaddr )ampsin sizeof(sin)) = 0) printf(connect failedn) WSACleanup() exit(1)
ns = acceptaccept(s(struct sockaddr )(ampremoteaddr)ampaddrlen)
Transport Layer ndash TCP
30
BB
TCP Connection Management
Three way handshakeStep 1 clientclient end system sends TCP
SYNSYN control segment to server (executed by TCP itself)
specifies initial seq number (isn)
Step 2 serverserver end system receives SYNSYN replies with SYNACKSYNACK control segment
ACKs received SYN allocates buffers specifies serverrsquos initial seq
number
Step 3 clientclient ACKsACKs the connection with
ACK=server_isn +1ACK=server_isn +1
allocates buffers sends SYN=0SYN=0
Connection established
Client
AcceptAccept (SYN=1
seq=server_isnack=client_isn+1)
time
Server
ConnectConnect (SYN=1 seq=client_isn)
ACK (SYN=0
seq=client_isn+1ack=server_isn+1)
Establishing a connection
This is what happens when we create a socket for
connection to a server
After establishing the connection the client can receive segments with app-generated data (SYN=0)(SYN=0)
Transport Layer ndash TCP
31
BB
TCP Connection Management (cont)
Closing a connection
client closes socket
closesocket(s)closesocket(s)
Java clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
How TCP connection is established and torn down
Transport Layer ndash TCP
32
BB
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
Enters timed wait - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer ndash TCP
33
BB
TCP Connection Management (cont)
TCP client lifecycle
TCP server lifecycle
Used in case ACK gets lost It is implementation-dependent (eg 30
seconds 1 minute 2 minutes
Connection formally closes ndash all resources (eg port numbers) are
released
1
2
3
4
5
6
7
8
9
10
11
12
Transport Layer ndash TCP
34
BB
End of Flow Control and Error Control
Transport Layer ndash TCP
35
BB
Flow Control vs Congestion ControlFlow Control vs Congestion ControlSimilar actions are taken but for very different reasons
Flow ControlFlow Controlbull point-to-point traffic between sender and receiverbull speed matching service matching the rate at which the sender is sending against the rate at which the receiving application is readingbull prevents Receiver Buffer from overflowing
CongestionCongestion ndash happens when there are too many sources attempting to send data at too high a rate for the routers along the path
Congestion ControlCongestion Controlbull service that makes sure that the routers between End Systems are able to carry the offered trafficbull prevents routers from overflowing
Same course of action Throttling of the sender
Transport Layer ndash TCP
36
BB
CongestionCongestion Informally too many sources sending too
much data too fast for network to handle different from flow control Manifestations
lost packets (buffer overflow at routers) long delays (queuing in router buffers)
a top-10 problem
Principles of Congestion ControlPrinciples of Congestion Control
Transport Layer ndash TCP
37
BB
Approaches towards congestion control
End-to-end congestion control
no explicit feedback from network
congestion inferred by end-systems from observed packet loss amp delay
approach taken by TCP
Network-assisted congestion control
routers provide feedback to End Systems in the form of single bit indicating
link congestion (SNA DECbit TCPIP ECN ATM ABR)
explicit transmission rate the sender should send at
1 2
Two broad approaches towards congestion control
Transport Layer ndash TCP
38
BB
TCP Congestion ControlTCP Congestion ControlHow TCP sender limits the rate at which it sends traffic into its connection
SENDER
(Amount of unACKed data)SENDER lt min(CongWin RcvWindow)
LastByteSent - LastByteACKed
Indirectly limits the senderrsquos send rateAssumptions
bull TCP receive buffer is very large ndash no RcvWindow constraint Amt of unACKed data at sender is solely limited by CongWin
bull Packet loss delay amp packet transmission delay are negligible
Sending rate (approx) CongWinRTT
By adjusting CongWin sender can therefore adjust the
rate at which it sends data into its connection
New variable ndash Congestion
Window
Transport Layer ndash TCP
39
BB
TCP Congestion ControlTCP Congestion ControlTCP uses ACKs to trigger (ldquoclockrdquo) its increase in congestion window size ndash ldquoself-clockingrdquo
Arrival of ACKs ndash indication to the sender that all is well
1 Slow Rate
bull Congestion window will be increased at a relatively slow rate
2 High rate
bull Congestion window will be increased more quickly
Transport Layer ndash TCP
40
BB
TCP Congestion ControlTCP Congestion ControlHow TCP perceives that there is congestion on the path
ldquoLoss Eventrdquo ndash when there is excessive congestion router buffers along the path overflows causing datagrams to be dropped which in turn results in a ldquoloss eventrdquo at the sender
1 Timeout
bull no ACK is received after segment loss
2 Receipt of three duplicate ACKs
bull segment loss is followed by three ACKs received at the sender
Transport Layer ndash TCP
41
BB
TCP Congestion Control details
sender limits transmission LastByteSent-LastByteAcked
cwnd
roughly
cwnd is dynamic function of perceived network congestion
How does sender perceive congestion
loss event = timeout or 3 duplicate acks
TCP sender reduces rate (cwnd) after loss event
Three mechanisms1 AIMD
2 slow start
3 conservative after timeout events
rate = cwnd
RTT Bytessec
Transport Layer ndash TCP
42
BB
TCP congestion avoidance congestion avoidance additive increase multiplicative decreaseadditive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every RTT until
loss is detected multiplicative decrease cut cwnd in half after loss
timecwnd
co
nge
stio
n w
indo
w s
ize
saw toothbehavior probingfor bandwidth
Transport Layer ndash TCP
43
BB
TCP Slow Start Slow Start when connection begins
increase rate exponentially until first loss event initially cwndcwnd = 1 MSS double cwndcwnd every RTT done by incrementing cwndcwnd by 1 MSS by 1 MSS for every ACK received
summary initial rate is slow but ramps up exponentially fast (doubling of the sending rate every RTT)
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transport Layer ndash TCP
44
BB
Refinement inferring lossRefinement inferring loss
after 3 dup ACKsafter 3 dup ACKs cwndcwnd is cut in halfhalf window then grows
linearlylinearly butbut after timeout after timeout
eventevent cwndcwnd is is set to 1 MSS1 MSS window then grows
exponentiallyexponentially Up to a thresholdUp to a threshold
then grows linearlylinearly
3 dup ACKs 3 dup ACKs indicates network capable of delivering some segments
timeouttimeout indicates a ldquomore alarmingrdquo congestion scenario
Philosophy
Transport Layer ndash TCP
45
BB
Refinement
Q when should the exponential increase switch to linear
A when cwndcwnd gets to 1212 of its value before timeout
Implementation variable ssthresh (slow-start threshold)ssthresh (slow-start threshold) on loss eventon loss event ssthreshssthresh is set to 1212 of cwndcwnd just
before loss event
Transport Layer ndash TCP
46
BB
TCP Sender Congestion ControlSTATE EVENT TCP SENDER Congestion-
Control ActionCommentary
SLOW START (SS)
ACK receipt for previously unACKed data
CongWin = CongWin + MSSIf(CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo
Resulting in a doubling of CongWin every RTT
Congestion Avoidance (CA)
ACK receipt for previously unACKed data
CongWin = CongWin + MSS (MSSCongWin)
Additive increase resulting in increasing of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate ACK
Threshold = CongWin 2CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
SS or CA Timeout Threshold = CongWin 2CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter Slow Start
SS or CA Duplicate ACK Increment duplicate ACK count for segment being ACKed
CongWin and Threshold not changed
Transport Layer ndash TCP
47
BB
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3 MSSretransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++
duplicate ACK
cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
Transport Layer ndash TCP
48
BB
Congestion control
TCPrsquos Congestion Control Service
CLIENTSERVER
Problem Gridlock sets-in when there is packet loss due to router congestion
forces the End Systems to decrease the rate at which packets are sentdecrease the rate at which packets are sent during periods of congestion
The sending systemrsquos packet is lost due to congestion and is alerted when it stops receiving ACKs of packets sent
Transport Layer ndash TCP
49
BBTransport Layer 3-49
Macroscopic Description of TCP throughput
whatrsquos the average throughout of TCP as a function of window size and RTT ignore slow start (typically very short phases)
let W be the window size when loss occurs when window is W throughput is WRTT just after loss window drops to W2 throughput to
W2RTT Throughput increases linearly (by MSSRTT every
RTT) Average Throughput 75 WRTT
(Based on Idealised model for the steady-state dynamics of TCP)
Transport Layer 3-50
TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired
throughput of 10 Gbps requires window size W = 83333 in-flight
segments Throughput in terms of loss rate
L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)
new versions of TCP is needed for high-speed environments
LRTT
MSS221
Transport Layer ndash TCP
51
BB
TCP FairnessTCP Fairness
Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth
TCP connection 1
bottleneckrouter
capacity RR
TCP connection 2
Go to Summary of TCP Congestion Control
Transport Layer ndash TCP
52
BB
Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link
Link with transmission rate of RR
Each connection have the same MSSMSS RTTRTT
No other TCP connections or UDP datagrams traverse the shared link
Ignore slow start phase of TCP
Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)
Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing
AssumptionsAssumptions
Transport Layer ndash TCP
53
BB
Why is TCP fairWhy is TCP fair
Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput
proportionally
RR
equal bandwidth shareequal bandwidth share
Connection 1 throughput
Connect
ion 2
thro
ughput
congestion avoidance additive increaseloss decrease window by factor of 2
RR
A point on the graph depicts the amount of link bandwidth
jointly jointly consumedconsumed by
the connections
Full bandwidth utilisation line
We can viewWe can view a simulation a simulation
on thison this
View SimulationView Simulation
Transport Layer ndash TCP
54
BB
The End
The next succeeding slides are just for additional reading
Transport Layer ndash TCP
55
BB
TCP Latency ModelingTCP Latency Modeling
In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs
Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth
Multiple End Systems sharing a link
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
Loop holes in TCPLoop holes in TCP
1 TCP connection
1 TCP connection
1 TCP connection
3 TCP connections
Multithreading implementationMultithreading implementation
Transport Layer ndash TCP
56
BB
TCP latency modelingTCP latency modeling
TCP connection establishment time data transfer delay Actual data transmission time
Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent
QQ How long How long does it take to does it take to
receive an object receive an object from a Web from a Web
serverserver
No data transfer delayNo data transfer delay
Therersquos data transfer delayTherersquos data transfer delay
the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety
Transport Layer ndash TCP
57
BB
TCP Latency ModelingTCP Latency Modeling
Network is uncongested with one link between end systems of rate RR
CongWinCongWin (fixed) determines the amount of data that can be sent
No packet loss no packet corruption no retransmissions required
Header overheads are negligible
File to sendFile to send = integer number of segments of size MSS
Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times
CLIENT SERVER
FILEFILE
Initial ThresholdThreshold of TCP congestion mechanism is very big
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits
SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)
Transport Layer ndash TCP
58
BB
TCP latency ModelingTCP latency Modeling
Case 1 latency = 2RTT + OR
K = Number of Windows of data that cover the object
K = OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
Number of segmentsRounded up to the nearest integer
Assume W=4 segmentsAssume W=4 segments
eg O=256bits S=32bits W=4
Transport Layer ndash TCP
59
BB
TCP latency ModelingTCP latency Modeling
Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]
Number of Windows of data that cover the objectK= OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR
Sender has to wait for an ACK after a windowrsquos worth of data sent
STALLED STALLED PERIODPERIOD
If there are k windows sender willbe stalled (k-1) times
Transport Layer ndash TCP
60
BB
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
STALLED STALLED PERIODPERIOD
4 windows4 windows
OS=15
Transport Layer ndash TCP
61
BB
bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the
object as followsobject as follows
S
OkK k 110 222min
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1log
1logmin
12min
2
2
S
OK
S
OkkK
S
OkK k
Note
Transport Layer ndash TCP
62
BB
bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window
bull Transmission of Transmission of kkth window window ==
bull Stall Time Stall Time ==
bull Latency Latency ==
12
k
R
S
12
k
R
SRTT
R
S
RTTR
S
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1K
1
122k
k
R
SRTT
R
S
R
ORTT
Transport Layer ndash TCP
63
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
Transport Layer ndash TCP
64
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
R
S
R
SRTTP
R
ORTT P )12(2Latency
bull Closed-form expression for the latency
Transport Layer ndash TCP
65
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
2
1
RTTRO
P
encyMinimumLat
Latency
Slow start will not significantly increase latency if RTT ltlt OR
Transport Layer ndash TCP
66
BB
httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features
Transport Layer ndash TCP
12
BB
TCP Round Trip Time and Timeout
Q how to set TCP how to set TCP timeout valuetimeout value
longer than RTT note RTT will vary
too short premature timeout unnecessary
retransmissions too long slow
reaction to segment loss
RTT = round trip time
Q how to estimate RTThow to estimate RTT SampleRTT measured time
from segment transmission until ACK receipt ignore retransmissions
cumulatively ACKed segments
SampleRTT will vary we would want estimated RTT to be smoother use several recent use several recent
measurementsmeasurements not just current SampleRTT
Main IssueMain Issue How long is the sender willing to wait before re-transmitting the packet
Transport Layer ndash TCP
13
BB
TCP Round Trip Time and Timeout
EstimatedRTT = (1-x) EstimatedRTT + x SampleRTTEstimatedRTT = (1-x) EstimatedRTT + x SampleRTT
Exponential weighted moving average influence of given sample decreases exponentially
fast typical value of x 0125 (RFC 2988)
Setting the timeoutSetting the timeout EstimatedRTT plus safety marginsafety margin large variation in EstimatedRTT -gt larger safety margin recommended value of x 025x 025
Timeout = EstimatedRTT + (4 Deviation)Timeout = EstimatedRTT + (4 Deviation)
DeviationDeviation = (1-x) Deviation + x |SampleRTT-EstimatedRTT|
Transport Layer ndash TCP
14
BB
EstimatedRTT = 0875 EstimatedRTT + 0125 SampleRTT
EstimatedRTT after the receipt of the ACK of segment 1EstimatedRTT = RTT for Segment 1 = 002746 second
EstimatedRTT after the receipt of the ACK of segment 2EstimatedRTT = 0875 002746 + 0125 0035557 = 00285
EstimatedRTT after the receipt of the ACK of segment 3EstimatedRTT = 0875 00285 + 0125 0070059 = 00337
EstimatedRTT after the receipt of the ACK of segment 4EstimatedRTT = 0875 00337+ 0125 011443 = 00438
EstimatedRTT after the receipt of the ACK of segment 5EstimatedRTT = 0875 00438 + 0125 013989 = 00558
EstimatedRTT after the receipt of the ACK of segment 6EstimatedRTT = 0875 00558 + 0125 018964 = 00725
Sample CalculationsSample Calculations
Transport Layer ndash TCP
15
BB
RTT Samples and RTT estimatesRTT Samples and RTT estimates
300
250
200
150
100 time
Estimated RTT
Sample RTT
RT
T (
mse
c)
The variations in the The variations in the SampleRTT SampleRTT are are smoothed out in the computation of thesmoothed out in the computation of the
EstimatedRTTEstimatedRTT
Transport Layer ndash TCP
16
BB
An Actual RTT estimationAn Actual RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RT
T (m
illis
eco
nds)
SampleRTT Estimated RTT
Transport Layer ndash TCP
17
BB
Simplified TCP sender assuming
waitfor
event
waitfor
event
event data received from application above
event timer timeout for segment with seq number y
event ACK receivedwith ACK number y
create send segment
retransmit segment
process ACK
- one way data transfer- no flow congestion control
FSM of TCP for Reliable Data Transfer FSM of TCP for Reliable Data Transfer
Transport Layer ndash TCP
18
BB
00 sendbase = initial_sequence number 01 nextseqnum = initial_sequence number 0203 loop (forever) 04 switch(event) 05 event data received from application above 06 create TCP segment with sequence number nextseqnum 07 If (timer is currently not running) start timer for segment
nextseqnum 08 pass segment to IP 09 nextseqnum = nextseqnum + length(data) 10 event timer timeout 11 retransmit not-yet-ACKed segment with smallest Seq 12 Start timer13 event ACK received with ACK field value of y 15 if (y gt sendbase) cumulative ACK of all data up to y 16 sendbase = y 17 If (there are currently any not-yet-ACKed segments)18 start timer19 20 end of loop forever
Associated with the oldest unACKed
segment
SIMPLIFIED TCPSIMPLIFIED TCP SENDER
AssumptionsAssumptions bull sender is not constrained by TCP flow or congestion controlbull that data from above is less than MSS in sizebull that data transfer is in one direction only
Transport Layer ndash TCP
20
BB
00 sendbase = initial_sequence number 01 nextseqnum = initial_sequence number 0203 loop (forever) 04 switch(event) 05 event data received from application above 06 create TCP segment with sequence number nextseqnum 07 start timer for segment nextseqnum 08 pass segment to IP 09 nextseqnum = nextseqnum + length(data) 10 event timer timeout for segment with sequence number y 11 retransmit segment with sequence number y 12 compute new timeout interval for segment y 13 restart timer for sequence number y 14 event ACK received with ACK field value of y 15 if (y gt sendbase) cumulative ACK of all data up to y 16 cancel all timers for segments with sequence numbers lt y 17 sendbase = y 18 19 else a duplicate ACK for already ACKed segment 20 increment number of duplicate ACKs received for y 21 if (number of duplicate ACKS received for y is 3) 22 perform TCP fast retransmission 23 resend segment with sequence number y 24 restart timer for segment y 25 26 end of loop forever
TCPTCP with MODIFICATIONSwith MODIFICATIONS SENDER
Why wait for the timeout to expire when consecutive
ACKs can be used to indicate a lost segment
With Fast Retransmit
Transport Layer ndash TCP
21
BB
TCP ACK generation [RFC 1122 RFC 2581]
Event
in-order segment arrival no gapseverything else already ACKed
in-order segment arrival no gaps one delayed ACK pending (due to action 1)
out-of-order segment arrivalwith higher than expect seq - a gap is detected
arrival of segment that partially or completely fills gap
TCP Receiver action
Delay sending the ACK Wait up to 500msfor next segment If next segment does not arrive in this interval send ACK
immediately send a singlecumulative ACK
send duplicate ACK indicating seq of next expected byte
Immediately send an ACK if segment startsat lower end of gap
1
2
3
4
Receiver does not discard out-of-order segmentsReceiver does not discard out-of-order segments
Transport Layer ndash TCP
22
BB
TCP Interesting Scenarios
Host A
Seq=92 8 bytes data
ACK=100
loss
tim
eout
time lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
Host A
Seq=100 20 bytes data
ACK=100
Seq=
92
tim
eout
time premature timeoutcumulative ACKs
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
ACK=120
Retransmission due to lost ACKSegment with Seq=100 not retransmitted
Timer is restarted here for Seq=92
Simplified TCP versionSimplified TCP version
Transport Layer ndash TCP
23
BB
Host A
Seq=100 20 bytes data
ACK=100Seq=
92
tim
eout
time
Host B
ACK=120
Seq=92 8 bytes data
Xloss
Cumulative ACK avoids retransmission of the first segment
TCP Retransmission ScenarioTCP Retransmission Scenario
Transport Layer ndash TCP
24
BB
TCP Modifications Doubling the Timeout IntervalDoubling the Timeout Interval
Provides a limited form of congestion control
Timer expiration is more likely caused by congestion in the network
TimeoutInterval = 2 TimeoutIntervalPrevious
TCP acts more politely by increasing the TimeoutInterval causing the sender to retransmit after longer and longer intervals
Congestion may get worse if sources
continue to retransmit packets
persistently
After ACK is received After ACK is received TimeoutInterval is derived TimeoutInterval is derived from most recent EstimatedRTT from most recent EstimatedRTT and DevRTTand DevRTT
Others check RFC 2018 ndash selective ACK
Transport Layer ndash TCP
25
BB
receiverreceiver explicitly informs sender of (dynamically changing) amount of free buffer space RcvWindowRcvWindow field
in TCP segment
sendersender keeps the amount of transmitted unACKed data less thanless than most recently received RcvWindowRcvWindow
sender wont overrun
receivers buffer bytransmitting too
much too fast
flow controlflow control
receiver buffering
RcvBufferRcvBuffer = size of TCP Receive Buffer
RcvWindowRcvWindow = amount of spare room in Buffer
TCP Flow ControlTCP Flow Control
Transport Layer ndash TCP
26
BB
FLOW CONTROL FLOW CONTROL ReceiverReceiver
04060 50100
LastByteRcvd
EXAMPLE HOST A sends a large file to HOST B
LastByteRead
Application Process
Data from IP
RcvBuffer
RcvWindow=RcvBuffer-[LastByteRcvd-LastByteRead]
RECEIVER HOST B ndash uses RcvWindow LastByteRcvd LastByteReadRcvWindow LastByteRcvd LastByteRead
Initially RcvWindow = RcvBuffer Application reads from the bufferHOST BHOST B tells HOST AHOST A how much spare roomhow much spare room it has in the connection buffer by placing its current value of RcvWindowRcvWindow in the receive window field of every segment it sends to HOST AHOST A
Transport Layer ndash TCP
27
BB
FLOW CONTROL SenderSender
04060 50100
LastByteSent
EXAMPLE HOST A sends a large file to HOST B
LastByteACKed
Data
To ensure that HOST B does not overflow HOST A maintains throughout the connectionrsquos life that [LastByteSent-LastByteACKed] lt= RcvWindow
SENDER HOST A
ACKs ACKs from from Host Host BB
SENDER HOST A ndash uses RcvWindow RcvWindow of HostBof HostB LastByteSent LastByteACKed LastByteSent LastByteACKed
Transport Layer ndash TCP
28
BB
FLOW CONTROLFLOW CONTROLSome issue to consider
RcvWindowRcvWindow ndash used by the connection to provide the flow control serviceflow control service
TCPTCP requires that requires that HOST AHOST A continue to sendcontinue to send segments with one segments with one data byte when HOST Brsquos receive data byte when HOST Brsquos receive window is window is 00 Such segments will be Such segments will be ACKedACKed by HOST B by HOST B Eventually the buffer will have Eventually the buffer will have some space and the some space and the ACKsACKs will will contain contain RcvWindow gt 0RcvWindow gt 0
What happens when the receive
buffer of HOST B is full full (that is when (that is when
RcvWindow=0)RcvWindow=0)
TCP sends a segment only when there is datadata or ACKACK to send Therefore the sender must maintain the connection lsquoalivealiversquo
Transport Layer ndash TCP
29
BB
TCP Connection ManagementTCP Connection Management
Recall TCP sender receiver establish ldquoconnectionconnectionrdquo before exchanging data segments
Initialize TCP variables sequence numbers buffers flow control info (eg RcvWindow)
ClientClient is the connection initiator
In Java Socket clientSocket = new Socket(hostnameport number) connect ServerServer is contacted by client
In JavaSocket accept()
if (connectconnect(s (struct sockaddr )ampsin sizeof(sin)) = 0) printf(connect failedn) WSACleanup() exit(1)
ns = acceptaccept(s(struct sockaddr )(ampremoteaddr)ampaddrlen)
Transport Layer ndash TCP
30
BB
TCP Connection Management
Three way handshakeStep 1 clientclient end system sends TCP
SYNSYN control segment to server (executed by TCP itself)
specifies initial seq number (isn)
Step 2 serverserver end system receives SYNSYN replies with SYNACKSYNACK control segment
ACKs received SYN allocates buffers specifies serverrsquos initial seq
number
Step 3 clientclient ACKsACKs the connection with
ACK=server_isn +1ACK=server_isn +1
allocates buffers sends SYN=0SYN=0
Connection established
Client
AcceptAccept (SYN=1
seq=server_isnack=client_isn+1)
time
Server
ConnectConnect (SYN=1 seq=client_isn)
ACK (SYN=0
seq=client_isn+1ack=server_isn+1)
Establishing a connection
This is what happens when we create a socket for
connection to a server
After establishing the connection the client can receive segments with app-generated data (SYN=0)(SYN=0)
Transport Layer ndash TCP
31
BB
TCP Connection Management (cont)
Closing a connection
client closes socket
closesocket(s)closesocket(s)
Java clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
How TCP connection is established and torn down
Transport Layer ndash TCP
32
BB
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
Enters timed wait - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer ndash TCP
33
BB
TCP Connection Management (cont)
TCP client lifecycle
TCP server lifecycle
Used in case ACK gets lost It is implementation-dependent (eg 30
seconds 1 minute 2 minutes
Connection formally closes ndash all resources (eg port numbers) are
released
1
2
3
4
5
6
7
8
9
10
11
12
Transport Layer ndash TCP
34
BB
End of Flow Control and Error Control
Transport Layer ndash TCP
35
BB
Flow Control vs Congestion ControlFlow Control vs Congestion ControlSimilar actions are taken but for very different reasons
Flow ControlFlow Controlbull point-to-point traffic between sender and receiverbull speed matching service matching the rate at which the sender is sending against the rate at which the receiving application is readingbull prevents Receiver Buffer from overflowing
CongestionCongestion ndash happens when there are too many sources attempting to send data at too high a rate for the routers along the path
Congestion ControlCongestion Controlbull service that makes sure that the routers between End Systems are able to carry the offered trafficbull prevents routers from overflowing
Same course of action Throttling of the sender
Transport Layer ndash TCP
36
BB
CongestionCongestion Informally too many sources sending too
much data too fast for network to handle different from flow control Manifestations
lost packets (buffer overflow at routers) long delays (queuing in router buffers)
a top-10 problem
Principles of Congestion ControlPrinciples of Congestion Control
Transport Layer ndash TCP
37
BB
Approaches towards congestion control
End-to-end congestion control
no explicit feedback from network
congestion inferred by end-systems from observed packet loss amp delay
approach taken by TCP
Network-assisted congestion control
routers provide feedback to End Systems in the form of single bit indicating
link congestion (SNA DECbit TCPIP ECN ATM ABR)
explicit transmission rate the sender should send at
1 2
Two broad approaches towards congestion control
Transport Layer ndash TCP
38
BB
TCP Congestion ControlTCP Congestion ControlHow TCP sender limits the rate at which it sends traffic into its connection
SENDER
(Amount of unACKed data)SENDER lt min(CongWin RcvWindow)
LastByteSent - LastByteACKed
Indirectly limits the senderrsquos send rateAssumptions
bull TCP receive buffer is very large ndash no RcvWindow constraint Amt of unACKed data at sender is solely limited by CongWin
bull Packet loss delay amp packet transmission delay are negligible
Sending rate (approx) CongWinRTT
By adjusting CongWin sender can therefore adjust the
rate at which it sends data into its connection
New variable ndash Congestion
Window
Transport Layer ndash TCP
39
BB
TCP Congestion ControlTCP Congestion ControlTCP uses ACKs to trigger (ldquoclockrdquo) its increase in congestion window size ndash ldquoself-clockingrdquo
Arrival of ACKs ndash indication to the sender that all is well
1 Slow Rate
bull Congestion window will be increased at a relatively slow rate
2 High rate
bull Congestion window will be increased more quickly
Transport Layer ndash TCP
40
BB
TCP Congestion ControlTCP Congestion ControlHow TCP perceives that there is congestion on the path
ldquoLoss Eventrdquo ndash when there is excessive congestion router buffers along the path overflows causing datagrams to be dropped which in turn results in a ldquoloss eventrdquo at the sender
1 Timeout
bull no ACK is received after segment loss
2 Receipt of three duplicate ACKs
bull segment loss is followed by three ACKs received at the sender
Transport Layer ndash TCP
41
BB
TCP Congestion Control details
sender limits transmission LastByteSent-LastByteAcked
cwnd
roughly
cwnd is dynamic function of perceived network congestion
How does sender perceive congestion
loss event = timeout or 3 duplicate acks
TCP sender reduces rate (cwnd) after loss event
Three mechanisms1 AIMD
2 slow start
3 conservative after timeout events
rate = cwnd
RTT Bytessec
Transport Layer ndash TCP
42
BB
TCP congestion avoidance congestion avoidance additive increase multiplicative decreaseadditive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every RTT until
loss is detected multiplicative decrease cut cwnd in half after loss
timecwnd
co
nge
stio
n w
indo
w s
ize
saw toothbehavior probingfor bandwidth
Transport Layer ndash TCP
43
BB
TCP Slow Start Slow Start when connection begins
increase rate exponentially until first loss event initially cwndcwnd = 1 MSS double cwndcwnd every RTT done by incrementing cwndcwnd by 1 MSS by 1 MSS for every ACK received
summary initial rate is slow but ramps up exponentially fast (doubling of the sending rate every RTT)
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transport Layer ndash TCP
44
BB
Refinement inferring lossRefinement inferring loss
after 3 dup ACKsafter 3 dup ACKs cwndcwnd is cut in halfhalf window then grows
linearlylinearly butbut after timeout after timeout
eventevent cwndcwnd is is set to 1 MSS1 MSS window then grows
exponentiallyexponentially Up to a thresholdUp to a threshold
then grows linearlylinearly
3 dup ACKs 3 dup ACKs indicates network capable of delivering some segments
timeouttimeout indicates a ldquomore alarmingrdquo congestion scenario
Philosophy
Transport Layer ndash TCP
45
BB
Refinement
Q when should the exponential increase switch to linear
A when cwndcwnd gets to 1212 of its value before timeout
Implementation variable ssthresh (slow-start threshold)ssthresh (slow-start threshold) on loss eventon loss event ssthreshssthresh is set to 1212 of cwndcwnd just
before loss event
Transport Layer ndash TCP
46
BB
TCP Sender Congestion ControlSTATE EVENT TCP SENDER Congestion-
Control ActionCommentary
SLOW START (SS)
ACK receipt for previously unACKed data
CongWin = CongWin + MSSIf(CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo
Resulting in a doubling of CongWin every RTT
Congestion Avoidance (CA)
ACK receipt for previously unACKed data
CongWin = CongWin + MSS (MSSCongWin)
Additive increase resulting in increasing of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate ACK
Threshold = CongWin 2CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
SS or CA Timeout Threshold = CongWin 2CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter Slow Start
SS or CA Duplicate ACK Increment duplicate ACK count for segment being ACKed
CongWin and Threshold not changed
Transport Layer ndash TCP
47
BB
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3 MSSretransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++
duplicate ACK
cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
Transport Layer ndash TCP
48
BB
Congestion control
TCPrsquos Congestion Control Service
CLIENTSERVER
Problem Gridlock sets-in when there is packet loss due to router congestion
forces the End Systems to decrease the rate at which packets are sentdecrease the rate at which packets are sent during periods of congestion
The sending systemrsquos packet is lost due to congestion and is alerted when it stops receiving ACKs of packets sent
Transport Layer ndash TCP
49
BBTransport Layer 3-49
Macroscopic Description of TCP throughput
whatrsquos the average throughout of TCP as a function of window size and RTT ignore slow start (typically very short phases)
let W be the window size when loss occurs when window is W throughput is WRTT just after loss window drops to W2 throughput to
W2RTT Throughput increases linearly (by MSSRTT every
RTT) Average Throughput 75 WRTT
(Based on Idealised model for the steady-state dynamics of TCP)
Transport Layer 3-50
TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired
throughput of 10 Gbps requires window size W = 83333 in-flight
segments Throughput in terms of loss rate
L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)
new versions of TCP is needed for high-speed environments
LRTT
MSS221
Transport Layer ndash TCP
51
BB
TCP FairnessTCP Fairness
Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth
TCP connection 1
bottleneckrouter
capacity RR
TCP connection 2
Go to Summary of TCP Congestion Control
Transport Layer ndash TCP
52
BB
Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link
Link with transmission rate of RR
Each connection have the same MSSMSS RTTRTT
No other TCP connections or UDP datagrams traverse the shared link
Ignore slow start phase of TCP
Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)
Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing
AssumptionsAssumptions
Transport Layer ndash TCP
53
BB
Why is TCP fairWhy is TCP fair
Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput
proportionally
RR
equal bandwidth shareequal bandwidth share
Connection 1 throughput
Connect
ion 2
thro
ughput
congestion avoidance additive increaseloss decrease window by factor of 2
RR
A point on the graph depicts the amount of link bandwidth
jointly jointly consumedconsumed by
the connections
Full bandwidth utilisation line
We can viewWe can view a simulation a simulation
on thison this
View SimulationView Simulation
Transport Layer ndash TCP
54
BB
The End
The next succeeding slides are just for additional reading
Transport Layer ndash TCP
55
BB
TCP Latency ModelingTCP Latency Modeling
In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs
Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth
Multiple End Systems sharing a link
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
Loop holes in TCPLoop holes in TCP
1 TCP connection
1 TCP connection
1 TCP connection
3 TCP connections
Multithreading implementationMultithreading implementation
Transport Layer ndash TCP
56
BB
TCP latency modelingTCP latency modeling
TCP connection establishment time data transfer delay Actual data transmission time
Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent
QQ How long How long does it take to does it take to
receive an object receive an object from a Web from a Web
serverserver
No data transfer delayNo data transfer delay
Therersquos data transfer delayTherersquos data transfer delay
the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety
Transport Layer ndash TCP
57
BB
TCP Latency ModelingTCP Latency Modeling
Network is uncongested with one link between end systems of rate RR
CongWinCongWin (fixed) determines the amount of data that can be sent
No packet loss no packet corruption no retransmissions required
Header overheads are negligible
File to sendFile to send = integer number of segments of size MSS
Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times
CLIENT SERVER
FILEFILE
Initial ThresholdThreshold of TCP congestion mechanism is very big
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits
SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)
Transport Layer ndash TCP
58
BB
TCP latency ModelingTCP latency Modeling
Case 1 latency = 2RTT + OR
K = Number of Windows of data that cover the object
K = OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
Number of segmentsRounded up to the nearest integer
Assume W=4 segmentsAssume W=4 segments
eg O=256bits S=32bits W=4
Transport Layer ndash TCP
59
BB
TCP latency ModelingTCP latency Modeling
Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]
Number of Windows of data that cover the objectK= OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR
Sender has to wait for an ACK after a windowrsquos worth of data sent
STALLED STALLED PERIODPERIOD
If there are k windows sender willbe stalled (k-1) times
Transport Layer ndash TCP
60
BB
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
STALLED STALLED PERIODPERIOD
4 windows4 windows
OS=15
Transport Layer ndash TCP
61
BB
bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the
object as followsobject as follows
S
OkK k 110 222min
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1log
1logmin
12min
2
2
S
OK
S
OkkK
S
OkK k
Note
Transport Layer ndash TCP
62
BB
bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window
bull Transmission of Transmission of kkth window window ==
bull Stall Time Stall Time ==
bull Latency Latency ==
12
k
R
S
12
k
R
SRTT
R
S
RTTR
S
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1K
1
122k
k
R
SRTT
R
S
R
ORTT
Transport Layer ndash TCP
63
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
Transport Layer ndash TCP
64
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
R
S
R
SRTTP
R
ORTT P )12(2Latency
bull Closed-form expression for the latency
Transport Layer ndash TCP
65
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
2
1
RTTRO
P
encyMinimumLat
Latency
Slow start will not significantly increase latency if RTT ltlt OR
Transport Layer ndash TCP
66
BB
httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features
Transport Layer ndash TCP
13
BB
TCP Round Trip Time and Timeout
EstimatedRTT = (1-x) EstimatedRTT + x SampleRTTEstimatedRTT = (1-x) EstimatedRTT + x SampleRTT
Exponential weighted moving average influence of given sample decreases exponentially
fast typical value of x 0125 (RFC 2988)
Setting the timeoutSetting the timeout EstimatedRTT plus safety marginsafety margin large variation in EstimatedRTT -gt larger safety margin recommended value of x 025x 025
Timeout = EstimatedRTT + (4 Deviation)Timeout = EstimatedRTT + (4 Deviation)
DeviationDeviation = (1-x) Deviation + x |SampleRTT-EstimatedRTT|
Transport Layer ndash TCP
14
BB
EstimatedRTT = 0875 EstimatedRTT + 0125 SampleRTT
EstimatedRTT after the receipt of the ACK of segment 1EstimatedRTT = RTT for Segment 1 = 002746 second
EstimatedRTT after the receipt of the ACK of segment 2EstimatedRTT = 0875 002746 + 0125 0035557 = 00285
EstimatedRTT after the receipt of the ACK of segment 3EstimatedRTT = 0875 00285 + 0125 0070059 = 00337
EstimatedRTT after the receipt of the ACK of segment 4EstimatedRTT = 0875 00337+ 0125 011443 = 00438
EstimatedRTT after the receipt of the ACK of segment 5EstimatedRTT = 0875 00438 + 0125 013989 = 00558
EstimatedRTT after the receipt of the ACK of segment 6EstimatedRTT = 0875 00558 + 0125 018964 = 00725
Sample CalculationsSample Calculations
Transport Layer ndash TCP
15
BB
RTT Samples and RTT estimatesRTT Samples and RTT estimates
300
250
200
150
100 time
Estimated RTT
Sample RTT
RT
T (
mse
c)
The variations in the The variations in the SampleRTT SampleRTT are are smoothed out in the computation of thesmoothed out in the computation of the
EstimatedRTTEstimatedRTT
Transport Layer ndash TCP
16
BB
An Actual RTT estimationAn Actual RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RT
T (m
illis
eco
nds)
SampleRTT Estimated RTT
Transport Layer ndash TCP
17
BB
Simplified TCP sender assuming
waitfor
event
waitfor
event
event data received from application above
event timer timeout for segment with seq number y
event ACK receivedwith ACK number y
create send segment
retransmit segment
process ACK
- one way data transfer- no flow congestion control
FSM of TCP for Reliable Data Transfer FSM of TCP for Reliable Data Transfer
Transport Layer ndash TCP
18
BB
00 sendbase = initial_sequence number 01 nextseqnum = initial_sequence number 0203 loop (forever) 04 switch(event) 05 event data received from application above 06 create TCP segment with sequence number nextseqnum 07 If (timer is currently not running) start timer for segment
nextseqnum 08 pass segment to IP 09 nextseqnum = nextseqnum + length(data) 10 event timer timeout 11 retransmit not-yet-ACKed segment with smallest Seq 12 Start timer13 event ACK received with ACK field value of y 15 if (y gt sendbase) cumulative ACK of all data up to y 16 sendbase = y 17 If (there are currently any not-yet-ACKed segments)18 start timer19 20 end of loop forever
Associated with the oldest unACKed
segment
SIMPLIFIED TCPSIMPLIFIED TCP SENDER
AssumptionsAssumptions bull sender is not constrained by TCP flow or congestion controlbull that data from above is less than MSS in sizebull that data transfer is in one direction only
Transport Layer ndash TCP
20
BB
00 sendbase = initial_sequence number 01 nextseqnum = initial_sequence number 0203 loop (forever) 04 switch(event) 05 event data received from application above 06 create TCP segment with sequence number nextseqnum 07 start timer for segment nextseqnum 08 pass segment to IP 09 nextseqnum = nextseqnum + length(data) 10 event timer timeout for segment with sequence number y 11 retransmit segment with sequence number y 12 compute new timeout interval for segment y 13 restart timer for sequence number y 14 event ACK received with ACK field value of y 15 if (y gt sendbase) cumulative ACK of all data up to y 16 cancel all timers for segments with sequence numbers lt y 17 sendbase = y 18 19 else a duplicate ACK for already ACKed segment 20 increment number of duplicate ACKs received for y 21 if (number of duplicate ACKS received for y is 3) 22 perform TCP fast retransmission 23 resend segment with sequence number y 24 restart timer for segment y 25 26 end of loop forever
TCPTCP with MODIFICATIONSwith MODIFICATIONS SENDER
Why wait for the timeout to expire when consecutive
ACKs can be used to indicate a lost segment
With Fast Retransmit
Transport Layer ndash TCP
21
BB
TCP ACK generation [RFC 1122 RFC 2581]
Event
in-order segment arrival no gapseverything else already ACKed
in-order segment arrival no gaps one delayed ACK pending (due to action 1)
out-of-order segment arrivalwith higher than expect seq - a gap is detected
arrival of segment that partially or completely fills gap
TCP Receiver action
Delay sending the ACK Wait up to 500msfor next segment If next segment does not arrive in this interval send ACK
immediately send a singlecumulative ACK
send duplicate ACK indicating seq of next expected byte
Immediately send an ACK if segment startsat lower end of gap
1
2
3
4
Receiver does not discard out-of-order segmentsReceiver does not discard out-of-order segments
Transport Layer ndash TCP
22
BB
TCP Interesting Scenarios
Host A
Seq=92 8 bytes data
ACK=100
loss
tim
eout
time lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
Host A
Seq=100 20 bytes data
ACK=100
Seq=
92
tim
eout
time premature timeoutcumulative ACKs
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
ACK=120
Retransmission due to lost ACKSegment with Seq=100 not retransmitted
Timer is restarted here for Seq=92
Simplified TCP versionSimplified TCP version
Transport Layer ndash TCP
23
BB
Host A
Seq=100 20 bytes data
ACK=100Seq=
92
tim
eout
time
Host B
ACK=120
Seq=92 8 bytes data
Xloss
Cumulative ACK avoids retransmission of the first segment
TCP Retransmission ScenarioTCP Retransmission Scenario
Transport Layer ndash TCP
24
BB
TCP Modifications Doubling the Timeout IntervalDoubling the Timeout Interval
Provides a limited form of congestion control
Timer expiration is more likely caused by congestion in the network
TimeoutInterval = 2 TimeoutIntervalPrevious
TCP acts more politely by increasing the TimeoutInterval causing the sender to retransmit after longer and longer intervals
Congestion may get worse if sources
continue to retransmit packets
persistently
After ACK is received After ACK is received TimeoutInterval is derived TimeoutInterval is derived from most recent EstimatedRTT from most recent EstimatedRTT and DevRTTand DevRTT
Others check RFC 2018 ndash selective ACK
Transport Layer ndash TCP
25
BB
receiverreceiver explicitly informs sender of (dynamically changing) amount of free buffer space RcvWindowRcvWindow field
in TCP segment
sendersender keeps the amount of transmitted unACKed data less thanless than most recently received RcvWindowRcvWindow
sender wont overrun
receivers buffer bytransmitting too
much too fast
flow controlflow control
receiver buffering
RcvBufferRcvBuffer = size of TCP Receive Buffer
RcvWindowRcvWindow = amount of spare room in Buffer
TCP Flow ControlTCP Flow Control
Transport Layer ndash TCP
26
BB
FLOW CONTROL FLOW CONTROL ReceiverReceiver
04060 50100
LastByteRcvd
EXAMPLE HOST A sends a large file to HOST B
LastByteRead
Application Process
Data from IP
RcvBuffer
RcvWindow=RcvBuffer-[LastByteRcvd-LastByteRead]
RECEIVER HOST B ndash uses RcvWindow LastByteRcvd LastByteReadRcvWindow LastByteRcvd LastByteRead
Initially RcvWindow = RcvBuffer Application reads from the bufferHOST BHOST B tells HOST AHOST A how much spare roomhow much spare room it has in the connection buffer by placing its current value of RcvWindowRcvWindow in the receive window field of every segment it sends to HOST AHOST A
Transport Layer ndash TCP
27
BB
FLOW CONTROL SenderSender
04060 50100
LastByteSent
EXAMPLE HOST A sends a large file to HOST B
LastByteACKed
Data
To ensure that HOST B does not overflow HOST A maintains throughout the connectionrsquos life that [LastByteSent-LastByteACKed] lt= RcvWindow
SENDER HOST A
ACKs ACKs from from Host Host BB
SENDER HOST A ndash uses RcvWindow RcvWindow of HostBof HostB LastByteSent LastByteACKed LastByteSent LastByteACKed
Transport Layer ndash TCP
28
BB
FLOW CONTROLFLOW CONTROLSome issue to consider
RcvWindowRcvWindow ndash used by the connection to provide the flow control serviceflow control service
TCPTCP requires that requires that HOST AHOST A continue to sendcontinue to send segments with one segments with one data byte when HOST Brsquos receive data byte when HOST Brsquos receive window is window is 00 Such segments will be Such segments will be ACKedACKed by HOST B by HOST B Eventually the buffer will have Eventually the buffer will have some space and the some space and the ACKsACKs will will contain contain RcvWindow gt 0RcvWindow gt 0
What happens when the receive
buffer of HOST B is full full (that is when (that is when
RcvWindow=0)RcvWindow=0)
TCP sends a segment only when there is datadata or ACKACK to send Therefore the sender must maintain the connection lsquoalivealiversquo
Transport Layer ndash TCP
29
BB
TCP Connection ManagementTCP Connection Management
Recall TCP sender receiver establish ldquoconnectionconnectionrdquo before exchanging data segments
Initialize TCP variables sequence numbers buffers flow control info (eg RcvWindow)
ClientClient is the connection initiator
In Java Socket clientSocket = new Socket(hostnameport number) connect ServerServer is contacted by client
In JavaSocket accept()
if (connectconnect(s (struct sockaddr )ampsin sizeof(sin)) = 0) printf(connect failedn) WSACleanup() exit(1)
ns = acceptaccept(s(struct sockaddr )(ampremoteaddr)ampaddrlen)
Transport Layer ndash TCP
30
BB
TCP Connection Management
Three way handshakeStep 1 clientclient end system sends TCP
SYNSYN control segment to server (executed by TCP itself)
specifies initial seq number (isn)
Step 2 serverserver end system receives SYNSYN replies with SYNACKSYNACK control segment
ACKs received SYN allocates buffers specifies serverrsquos initial seq
number
Step 3 clientclient ACKsACKs the connection with
ACK=server_isn +1ACK=server_isn +1
allocates buffers sends SYN=0SYN=0
Connection established
Client
AcceptAccept (SYN=1
seq=server_isnack=client_isn+1)
time
Server
ConnectConnect (SYN=1 seq=client_isn)
ACK (SYN=0
seq=client_isn+1ack=server_isn+1)
Establishing a connection
This is what happens when we create a socket for
connection to a server
After establishing the connection the client can receive segments with app-generated data (SYN=0)(SYN=0)
Transport Layer ndash TCP
31
BB
TCP Connection Management (cont)
Closing a connection
client closes socket
closesocket(s)closesocket(s)
Java clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
How TCP connection is established and torn down
Transport Layer ndash TCP
32
BB
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
Enters timed wait - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer ndash TCP
33
BB
TCP Connection Management (cont)
TCP client lifecycle
TCP server lifecycle
Used in case ACK gets lost It is implementation-dependent (eg 30
seconds 1 minute 2 minutes
Connection formally closes ndash all resources (eg port numbers) are
released
1
2
3
4
5
6
7
8
9
10
11
12
Transport Layer ndash TCP
34
BB
End of Flow Control and Error Control
Transport Layer ndash TCP
35
BB
Flow Control vs Congestion ControlFlow Control vs Congestion ControlSimilar actions are taken but for very different reasons
Flow ControlFlow Controlbull point-to-point traffic between sender and receiverbull speed matching service matching the rate at which the sender is sending against the rate at which the receiving application is readingbull prevents Receiver Buffer from overflowing
CongestionCongestion ndash happens when there are too many sources attempting to send data at too high a rate for the routers along the path
Congestion ControlCongestion Controlbull service that makes sure that the routers between End Systems are able to carry the offered trafficbull prevents routers from overflowing
Same course of action Throttling of the sender
Transport Layer ndash TCP
36
BB
CongestionCongestion Informally too many sources sending too
much data too fast for network to handle different from flow control Manifestations
lost packets (buffer overflow at routers) long delays (queuing in router buffers)
a top-10 problem
Principles of Congestion ControlPrinciples of Congestion Control
Transport Layer ndash TCP
37
BB
Approaches towards congestion control
End-to-end congestion control
no explicit feedback from network
congestion inferred by end-systems from observed packet loss amp delay
approach taken by TCP
Network-assisted congestion control
routers provide feedback to End Systems in the form of single bit indicating
link congestion (SNA DECbit TCPIP ECN ATM ABR)
explicit transmission rate the sender should send at
1 2
Two broad approaches towards congestion control
Transport Layer ndash TCP
38
BB
TCP Congestion ControlTCP Congestion ControlHow TCP sender limits the rate at which it sends traffic into its connection
SENDER
(Amount of unACKed data)SENDER lt min(CongWin RcvWindow)
LastByteSent - LastByteACKed
Indirectly limits the senderrsquos send rateAssumptions
bull TCP receive buffer is very large ndash no RcvWindow constraint Amt of unACKed data at sender is solely limited by CongWin
bull Packet loss delay amp packet transmission delay are negligible
Sending rate (approx) CongWinRTT
By adjusting CongWin sender can therefore adjust the
rate at which it sends data into its connection
New variable ndash Congestion
Window
Transport Layer ndash TCP
39
BB
TCP Congestion ControlTCP Congestion ControlTCP uses ACKs to trigger (ldquoclockrdquo) its increase in congestion window size ndash ldquoself-clockingrdquo
Arrival of ACKs ndash indication to the sender that all is well
1 Slow Rate
bull Congestion window will be increased at a relatively slow rate
2 High rate
bull Congestion window will be increased more quickly
Transport Layer ndash TCP
40
BB
TCP Congestion ControlTCP Congestion ControlHow TCP perceives that there is congestion on the path
ldquoLoss Eventrdquo ndash when there is excessive congestion router buffers along the path overflows causing datagrams to be dropped which in turn results in a ldquoloss eventrdquo at the sender
1 Timeout
bull no ACK is received after segment loss
2 Receipt of three duplicate ACKs
bull segment loss is followed by three ACKs received at the sender
Transport Layer ndash TCP
41
BB
TCP Congestion Control details
sender limits transmission LastByteSent-LastByteAcked
cwnd
roughly
cwnd is dynamic function of perceived network congestion
How does sender perceive congestion
loss event = timeout or 3 duplicate acks
TCP sender reduces rate (cwnd) after loss event
Three mechanisms1 AIMD
2 slow start
3 conservative after timeout events
rate = cwnd
RTT Bytessec
Transport Layer ndash TCP
42
BB
TCP congestion avoidance congestion avoidance additive increase multiplicative decreaseadditive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every RTT until
loss is detected multiplicative decrease cut cwnd in half after loss
timecwnd
co
nge
stio
n w
indo
w s
ize
saw toothbehavior probingfor bandwidth
Transport Layer ndash TCP
43
BB
TCP Slow Start Slow Start when connection begins
increase rate exponentially until first loss event initially cwndcwnd = 1 MSS double cwndcwnd every RTT done by incrementing cwndcwnd by 1 MSS by 1 MSS for every ACK received
summary initial rate is slow but ramps up exponentially fast (doubling of the sending rate every RTT)
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transport Layer ndash TCP
44
BB
Refinement inferring lossRefinement inferring loss
after 3 dup ACKsafter 3 dup ACKs cwndcwnd is cut in halfhalf window then grows
linearlylinearly butbut after timeout after timeout
eventevent cwndcwnd is is set to 1 MSS1 MSS window then grows
exponentiallyexponentially Up to a thresholdUp to a threshold
then grows linearlylinearly
3 dup ACKs 3 dup ACKs indicates network capable of delivering some segments
timeouttimeout indicates a ldquomore alarmingrdquo congestion scenario
Philosophy
Transport Layer ndash TCP
45
BB
Refinement
Q when should the exponential increase switch to linear
A when cwndcwnd gets to 1212 of its value before timeout
Implementation variable ssthresh (slow-start threshold)ssthresh (slow-start threshold) on loss eventon loss event ssthreshssthresh is set to 1212 of cwndcwnd just
before loss event
Transport Layer ndash TCP
46
BB
TCP Sender Congestion ControlSTATE EVENT TCP SENDER Congestion-
Control ActionCommentary
SLOW START (SS)
ACK receipt for previously unACKed data
CongWin = CongWin + MSSIf(CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo
Resulting in a doubling of CongWin every RTT
Congestion Avoidance (CA)
ACK receipt for previously unACKed data
CongWin = CongWin + MSS (MSSCongWin)
Additive increase resulting in increasing of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate ACK
Threshold = CongWin 2CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
SS or CA Timeout Threshold = CongWin 2CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter Slow Start
SS or CA Duplicate ACK Increment duplicate ACK count for segment being ACKed
CongWin and Threshold not changed
Transport Layer ndash TCP
47
BB
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3 MSSretransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++
duplicate ACK
cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
Transport Layer ndash TCP
48
BB
Congestion control
TCPrsquos Congestion Control Service
CLIENTSERVER
Problem Gridlock sets-in when there is packet loss due to router congestion
forces the End Systems to decrease the rate at which packets are sentdecrease the rate at which packets are sent during periods of congestion
The sending systemrsquos packet is lost due to congestion and is alerted when it stops receiving ACKs of packets sent
Transport Layer ndash TCP
49
BBTransport Layer 3-49
Macroscopic Description of TCP throughput
whatrsquos the average throughout of TCP as a function of window size and RTT ignore slow start (typically very short phases)
let W be the window size when loss occurs when window is W throughput is WRTT just after loss window drops to W2 throughput to
W2RTT Throughput increases linearly (by MSSRTT every
RTT) Average Throughput 75 WRTT
(Based on Idealised model for the steady-state dynamics of TCP)
Transport Layer 3-50
TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired
throughput of 10 Gbps requires window size W = 83333 in-flight
segments Throughput in terms of loss rate
L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)
new versions of TCP is needed for high-speed environments
LRTT
MSS221
Transport Layer ndash TCP
51
BB
TCP FairnessTCP Fairness
Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth
TCP connection 1
bottleneckrouter
capacity RR
TCP connection 2
Go to Summary of TCP Congestion Control
Transport Layer ndash TCP
52
BB
Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link
Link with transmission rate of RR
Each connection have the same MSSMSS RTTRTT
No other TCP connections or UDP datagrams traverse the shared link
Ignore slow start phase of TCP
Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)
Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing
AssumptionsAssumptions
Transport Layer ndash TCP
53
BB
Why is TCP fairWhy is TCP fair
Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput
proportionally
RR
equal bandwidth shareequal bandwidth share
Connection 1 throughput
Connect
ion 2
thro
ughput
congestion avoidance additive increaseloss decrease window by factor of 2
RR
A point on the graph depicts the amount of link bandwidth
jointly jointly consumedconsumed by
the connections
Full bandwidth utilisation line
We can viewWe can view a simulation a simulation
on thison this
View SimulationView Simulation
Transport Layer ndash TCP
54
BB
The End
The next succeeding slides are just for additional reading
Transport Layer ndash TCP
55
BB
TCP Latency ModelingTCP Latency Modeling
In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs
Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth
Multiple End Systems sharing a link
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
Loop holes in TCPLoop holes in TCP
1 TCP connection
1 TCP connection
1 TCP connection
3 TCP connections
Multithreading implementationMultithreading implementation
Transport Layer ndash TCP
56
BB
TCP latency modelingTCP latency modeling
TCP connection establishment time data transfer delay Actual data transmission time
Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent
QQ How long How long does it take to does it take to
receive an object receive an object from a Web from a Web
serverserver
No data transfer delayNo data transfer delay
Therersquos data transfer delayTherersquos data transfer delay
the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety
Transport Layer ndash TCP
57
BB
TCP Latency ModelingTCP Latency Modeling
Network is uncongested with one link between end systems of rate RR
CongWinCongWin (fixed) determines the amount of data that can be sent
No packet loss no packet corruption no retransmissions required
Header overheads are negligible
File to sendFile to send = integer number of segments of size MSS
Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times
CLIENT SERVER
FILEFILE
Initial ThresholdThreshold of TCP congestion mechanism is very big
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits
SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)
Transport Layer ndash TCP
58
BB
TCP latency ModelingTCP latency Modeling
Case 1 latency = 2RTT + OR
K = Number of Windows of data that cover the object
K = OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
Number of segmentsRounded up to the nearest integer
Assume W=4 segmentsAssume W=4 segments
eg O=256bits S=32bits W=4
Transport Layer ndash TCP
59
BB
TCP latency ModelingTCP latency Modeling
Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]
Number of Windows of data that cover the objectK= OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR
Sender has to wait for an ACK after a windowrsquos worth of data sent
STALLED STALLED PERIODPERIOD
If there are k windows sender willbe stalled (k-1) times
Transport Layer ndash TCP
60
BB
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
STALLED STALLED PERIODPERIOD
4 windows4 windows
OS=15
Transport Layer ndash TCP
61
BB
bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the
object as followsobject as follows
S
OkK k 110 222min
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1log
1logmin
12min
2
2
S
OK
S
OkkK
S
OkK k
Note
Transport Layer ndash TCP
62
BB
bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window
bull Transmission of Transmission of kkth window window ==
bull Stall Time Stall Time ==
bull Latency Latency ==
12
k
R
S
12
k
R
SRTT
R
S
RTTR
S
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1K
1
122k
k
R
SRTT
R
S
R
ORTT
Transport Layer ndash TCP
63
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
Transport Layer ndash TCP
64
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
R
S
R
SRTTP
R
ORTT P )12(2Latency
bull Closed-form expression for the latency
Transport Layer ndash TCP
65
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
2
1
RTTRO
P
encyMinimumLat
Latency
Slow start will not significantly increase latency if RTT ltlt OR
Transport Layer ndash TCP
66
BB
httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features
Transport Layer ndash TCP
14
BB
EstimatedRTT = 0875 EstimatedRTT + 0125 SampleRTT
EstimatedRTT after the receipt of the ACK of segment 1EstimatedRTT = RTT for Segment 1 = 002746 second
EstimatedRTT after the receipt of the ACK of segment 2EstimatedRTT = 0875 002746 + 0125 0035557 = 00285
EstimatedRTT after the receipt of the ACK of segment 3EstimatedRTT = 0875 00285 + 0125 0070059 = 00337
EstimatedRTT after the receipt of the ACK of segment 4EstimatedRTT = 0875 00337+ 0125 011443 = 00438
EstimatedRTT after the receipt of the ACK of segment 5EstimatedRTT = 0875 00438 + 0125 013989 = 00558
EstimatedRTT after the receipt of the ACK of segment 6EstimatedRTT = 0875 00558 + 0125 018964 = 00725
Sample CalculationsSample Calculations
Transport Layer ndash TCP
15
BB
RTT Samples and RTT estimatesRTT Samples and RTT estimates
300
250
200
150
100 time
Estimated RTT
Sample RTT
RT
T (
mse
c)
The variations in the The variations in the SampleRTT SampleRTT are are smoothed out in the computation of thesmoothed out in the computation of the
EstimatedRTTEstimatedRTT
Transport Layer ndash TCP
16
BB
An Actual RTT estimationAn Actual RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RT
T (m
illis
eco
nds)
SampleRTT Estimated RTT
Transport Layer ndash TCP
17
BB
Simplified TCP sender assuming
waitfor
event
waitfor
event
event data received from application above
event timer timeout for segment with seq number y
event ACK receivedwith ACK number y
create send segment
retransmit segment
process ACK
- one way data transfer- no flow congestion control
FSM of TCP for Reliable Data Transfer FSM of TCP for Reliable Data Transfer
Transport Layer ndash TCP
18
BB
00 sendbase = initial_sequence number 01 nextseqnum = initial_sequence number 0203 loop (forever) 04 switch(event) 05 event data received from application above 06 create TCP segment with sequence number nextseqnum 07 If (timer is currently not running) start timer for segment
nextseqnum 08 pass segment to IP 09 nextseqnum = nextseqnum + length(data) 10 event timer timeout 11 retransmit not-yet-ACKed segment with smallest Seq 12 Start timer13 event ACK received with ACK field value of y 15 if (y gt sendbase) cumulative ACK of all data up to y 16 sendbase = y 17 If (there are currently any not-yet-ACKed segments)18 start timer19 20 end of loop forever
Associated with the oldest unACKed
segment
SIMPLIFIED TCPSIMPLIFIED TCP SENDER
AssumptionsAssumptions bull sender is not constrained by TCP flow or congestion controlbull that data from above is less than MSS in sizebull that data transfer is in one direction only
Transport Layer ndash TCP
20
BB
00 sendbase = initial_sequence number 01 nextseqnum = initial_sequence number 0203 loop (forever) 04 switch(event) 05 event data received from application above 06 create TCP segment with sequence number nextseqnum 07 start timer for segment nextseqnum 08 pass segment to IP 09 nextseqnum = nextseqnum + length(data) 10 event timer timeout for segment with sequence number y 11 retransmit segment with sequence number y 12 compute new timeout interval for segment y 13 restart timer for sequence number y 14 event ACK received with ACK field value of y 15 if (y gt sendbase) cumulative ACK of all data up to y 16 cancel all timers for segments with sequence numbers lt y 17 sendbase = y 18 19 else a duplicate ACK for already ACKed segment 20 increment number of duplicate ACKs received for y 21 if (number of duplicate ACKS received for y is 3) 22 perform TCP fast retransmission 23 resend segment with sequence number y 24 restart timer for segment y 25 26 end of loop forever
TCPTCP with MODIFICATIONSwith MODIFICATIONS SENDER
Why wait for the timeout to expire when consecutive
ACKs can be used to indicate a lost segment
With Fast Retransmit
Transport Layer ndash TCP
21
BB
TCP ACK generation [RFC 1122 RFC 2581]
Event
in-order segment arrival no gapseverything else already ACKed
in-order segment arrival no gaps one delayed ACK pending (due to action 1)
out-of-order segment arrivalwith higher than expect seq - a gap is detected
arrival of segment that partially or completely fills gap
TCP Receiver action
Delay sending the ACK Wait up to 500msfor next segment If next segment does not arrive in this interval send ACK
immediately send a singlecumulative ACK
send duplicate ACK indicating seq of next expected byte
Immediately send an ACK if segment startsat lower end of gap
1
2
3
4
Receiver does not discard out-of-order segmentsReceiver does not discard out-of-order segments
Transport Layer ndash TCP
22
BB
TCP Interesting Scenarios
Host A
Seq=92 8 bytes data
ACK=100
loss
tim
eout
time lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
Host A
Seq=100 20 bytes data
ACK=100
Seq=
92
tim
eout
time premature timeoutcumulative ACKs
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
ACK=120
Retransmission due to lost ACKSegment with Seq=100 not retransmitted
Timer is restarted here for Seq=92
Simplified TCP versionSimplified TCP version
Transport Layer ndash TCP
23
BB
Host A
Seq=100 20 bytes data
ACK=100Seq=
92
tim
eout
time
Host B
ACK=120
Seq=92 8 bytes data
Xloss
Cumulative ACK avoids retransmission of the first segment
TCP Retransmission ScenarioTCP Retransmission Scenario
Transport Layer ndash TCP
24
BB
TCP Modifications Doubling the Timeout IntervalDoubling the Timeout Interval
Provides a limited form of congestion control
Timer expiration is more likely caused by congestion in the network
TimeoutInterval = 2 TimeoutIntervalPrevious
TCP acts more politely by increasing the TimeoutInterval causing the sender to retransmit after longer and longer intervals
Congestion may get worse if sources
continue to retransmit packets
persistently
After ACK is received After ACK is received TimeoutInterval is derived TimeoutInterval is derived from most recent EstimatedRTT from most recent EstimatedRTT and DevRTTand DevRTT
Others check RFC 2018 ndash selective ACK
Transport Layer ndash TCP
25
BB
receiverreceiver explicitly informs sender of (dynamically changing) amount of free buffer space RcvWindowRcvWindow field
in TCP segment
sendersender keeps the amount of transmitted unACKed data less thanless than most recently received RcvWindowRcvWindow
sender wont overrun
receivers buffer bytransmitting too
much too fast
flow controlflow control
receiver buffering
RcvBufferRcvBuffer = size of TCP Receive Buffer
RcvWindowRcvWindow = amount of spare room in Buffer
TCP Flow ControlTCP Flow Control
Transport Layer ndash TCP
26
BB
FLOW CONTROL FLOW CONTROL ReceiverReceiver
04060 50100
LastByteRcvd
EXAMPLE HOST A sends a large file to HOST B
LastByteRead
Application Process
Data from IP
RcvBuffer
RcvWindow=RcvBuffer-[LastByteRcvd-LastByteRead]
RECEIVER HOST B ndash uses RcvWindow LastByteRcvd LastByteReadRcvWindow LastByteRcvd LastByteRead
Initially RcvWindow = RcvBuffer Application reads from the bufferHOST BHOST B tells HOST AHOST A how much spare roomhow much spare room it has in the connection buffer by placing its current value of RcvWindowRcvWindow in the receive window field of every segment it sends to HOST AHOST A
Transport Layer ndash TCP
27
BB
FLOW CONTROL SenderSender
04060 50100
LastByteSent
EXAMPLE HOST A sends a large file to HOST B
LastByteACKed
Data
To ensure that HOST B does not overflow HOST A maintains throughout the connectionrsquos life that [LastByteSent-LastByteACKed] lt= RcvWindow
SENDER HOST A
ACKs ACKs from from Host Host BB
SENDER HOST A ndash uses RcvWindow RcvWindow of HostBof HostB LastByteSent LastByteACKed LastByteSent LastByteACKed
Transport Layer ndash TCP
28
BB
FLOW CONTROLFLOW CONTROLSome issue to consider
RcvWindowRcvWindow ndash used by the connection to provide the flow control serviceflow control service
TCPTCP requires that requires that HOST AHOST A continue to sendcontinue to send segments with one segments with one data byte when HOST Brsquos receive data byte when HOST Brsquos receive window is window is 00 Such segments will be Such segments will be ACKedACKed by HOST B by HOST B Eventually the buffer will have Eventually the buffer will have some space and the some space and the ACKsACKs will will contain contain RcvWindow gt 0RcvWindow gt 0
What happens when the receive
buffer of HOST B is full full (that is when (that is when
RcvWindow=0)RcvWindow=0)
TCP sends a segment only when there is datadata or ACKACK to send Therefore the sender must maintain the connection lsquoalivealiversquo
Transport Layer ndash TCP
29
BB
TCP Connection ManagementTCP Connection Management
Recall TCP sender receiver establish ldquoconnectionconnectionrdquo before exchanging data segments
Initialize TCP variables sequence numbers buffers flow control info (eg RcvWindow)
ClientClient is the connection initiator
In Java Socket clientSocket = new Socket(hostnameport number) connect ServerServer is contacted by client
In JavaSocket accept()
if (connectconnect(s (struct sockaddr )ampsin sizeof(sin)) = 0) printf(connect failedn) WSACleanup() exit(1)
ns = acceptaccept(s(struct sockaddr )(ampremoteaddr)ampaddrlen)
Transport Layer ndash TCP
30
BB
TCP Connection Management
Three way handshakeStep 1 clientclient end system sends TCP
SYNSYN control segment to server (executed by TCP itself)
specifies initial seq number (isn)
Step 2 serverserver end system receives SYNSYN replies with SYNACKSYNACK control segment
ACKs received SYN allocates buffers specifies serverrsquos initial seq
number
Step 3 clientclient ACKsACKs the connection with
ACK=server_isn +1ACK=server_isn +1
allocates buffers sends SYN=0SYN=0
Connection established
Client
AcceptAccept (SYN=1
seq=server_isnack=client_isn+1)
time
Server
ConnectConnect (SYN=1 seq=client_isn)
ACK (SYN=0
seq=client_isn+1ack=server_isn+1)
Establishing a connection
This is what happens when we create a socket for
connection to a server
After establishing the connection the client can receive segments with app-generated data (SYN=0)(SYN=0)
Transport Layer ndash TCP
31
BB
TCP Connection Management (cont)
Closing a connection
client closes socket
closesocket(s)closesocket(s)
Java clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
How TCP connection is established and torn down
Transport Layer ndash TCP
32
BB
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
Enters timed wait - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer ndash TCP
33
BB
TCP Connection Management (cont)
TCP client lifecycle
TCP server lifecycle
Used in case ACK gets lost It is implementation-dependent (eg 30
seconds 1 minute 2 minutes
Connection formally closes ndash all resources (eg port numbers) are
released
1
2
3
4
5
6
7
8
9
10
11
12
Transport Layer ndash TCP
34
BB
End of Flow Control and Error Control
Transport Layer ndash TCP
35
BB
Flow Control vs Congestion ControlFlow Control vs Congestion ControlSimilar actions are taken but for very different reasons
Flow ControlFlow Controlbull point-to-point traffic between sender and receiverbull speed matching service matching the rate at which the sender is sending against the rate at which the receiving application is readingbull prevents Receiver Buffer from overflowing
CongestionCongestion ndash happens when there are too many sources attempting to send data at too high a rate for the routers along the path
Congestion ControlCongestion Controlbull service that makes sure that the routers between End Systems are able to carry the offered trafficbull prevents routers from overflowing
Same course of action Throttling of the sender
Transport Layer ndash TCP
36
BB
CongestionCongestion Informally too many sources sending too
much data too fast for network to handle different from flow control Manifestations
lost packets (buffer overflow at routers) long delays (queuing in router buffers)
a top-10 problem
Principles of Congestion ControlPrinciples of Congestion Control
Transport Layer ndash TCP
37
BB
Approaches towards congestion control
End-to-end congestion control
no explicit feedback from network
congestion inferred by end-systems from observed packet loss amp delay
approach taken by TCP
Network-assisted congestion control
routers provide feedback to End Systems in the form of single bit indicating
link congestion (SNA DECbit TCPIP ECN ATM ABR)
explicit transmission rate the sender should send at
1 2
Two broad approaches towards congestion control
Transport Layer ndash TCP
38
BB
TCP Congestion ControlTCP Congestion ControlHow TCP sender limits the rate at which it sends traffic into its connection
SENDER
(Amount of unACKed data)SENDER lt min(CongWin RcvWindow)
LastByteSent - LastByteACKed
Indirectly limits the senderrsquos send rateAssumptions
bull TCP receive buffer is very large ndash no RcvWindow constraint Amt of unACKed data at sender is solely limited by CongWin
bull Packet loss delay amp packet transmission delay are negligible
Sending rate (approx) CongWinRTT
By adjusting CongWin sender can therefore adjust the
rate at which it sends data into its connection
New variable ndash Congestion
Window
Transport Layer ndash TCP
39
BB
TCP Congestion ControlTCP Congestion ControlTCP uses ACKs to trigger (ldquoclockrdquo) its increase in congestion window size ndash ldquoself-clockingrdquo
Arrival of ACKs ndash indication to the sender that all is well
1 Slow Rate
bull Congestion window will be increased at a relatively slow rate
2 High rate
bull Congestion window will be increased more quickly
Transport Layer ndash TCP
40
BB
TCP Congestion ControlTCP Congestion ControlHow TCP perceives that there is congestion on the path
ldquoLoss Eventrdquo ndash when there is excessive congestion router buffers along the path overflows causing datagrams to be dropped which in turn results in a ldquoloss eventrdquo at the sender
1 Timeout
bull no ACK is received after segment loss
2 Receipt of three duplicate ACKs
bull segment loss is followed by three ACKs received at the sender
Transport Layer ndash TCP
41
BB
TCP Congestion Control details
sender limits transmission LastByteSent-LastByteAcked
cwnd
roughly
cwnd is dynamic function of perceived network congestion
How does sender perceive congestion
loss event = timeout or 3 duplicate acks
TCP sender reduces rate (cwnd) after loss event
Three mechanisms1 AIMD
2 slow start
3 conservative after timeout events
rate = cwnd
RTT Bytessec
Transport Layer ndash TCP
42
BB
TCP congestion avoidance congestion avoidance additive increase multiplicative decreaseadditive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every RTT until
loss is detected multiplicative decrease cut cwnd in half after loss
timecwnd
co
nge
stio
n w
indo
w s
ize
saw toothbehavior probingfor bandwidth
Transport Layer ndash TCP
43
BB
TCP Slow Start Slow Start when connection begins
increase rate exponentially until first loss event initially cwndcwnd = 1 MSS double cwndcwnd every RTT done by incrementing cwndcwnd by 1 MSS by 1 MSS for every ACK received
summary initial rate is slow but ramps up exponentially fast (doubling of the sending rate every RTT)
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transport Layer ndash TCP
44
BB
Refinement inferring lossRefinement inferring loss
after 3 dup ACKsafter 3 dup ACKs cwndcwnd is cut in halfhalf window then grows
linearlylinearly butbut after timeout after timeout
eventevent cwndcwnd is is set to 1 MSS1 MSS window then grows
exponentiallyexponentially Up to a thresholdUp to a threshold
then grows linearlylinearly
3 dup ACKs 3 dup ACKs indicates network capable of delivering some segments
timeouttimeout indicates a ldquomore alarmingrdquo congestion scenario
Philosophy
Transport Layer ndash TCP
45
BB
Refinement
Q when should the exponential increase switch to linear
A when cwndcwnd gets to 1212 of its value before timeout
Implementation variable ssthresh (slow-start threshold)ssthresh (slow-start threshold) on loss eventon loss event ssthreshssthresh is set to 1212 of cwndcwnd just
before loss event
Transport Layer ndash TCP
46
BB
TCP Sender Congestion ControlSTATE EVENT TCP SENDER Congestion-
Control ActionCommentary
SLOW START (SS)
ACK receipt for previously unACKed data
CongWin = CongWin + MSSIf(CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo
Resulting in a doubling of CongWin every RTT
Congestion Avoidance (CA)
ACK receipt for previously unACKed data
CongWin = CongWin + MSS (MSSCongWin)
Additive increase resulting in increasing of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate ACK
Threshold = CongWin 2CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
SS or CA Timeout Threshold = CongWin 2CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter Slow Start
SS or CA Duplicate ACK Increment duplicate ACK count for segment being ACKed
CongWin and Threshold not changed
Transport Layer ndash TCP
47
BB
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3 MSSretransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++
duplicate ACK
cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
Transport Layer ndash TCP
48
BB
Congestion control
TCPrsquos Congestion Control Service
CLIENTSERVER
Problem Gridlock sets-in when there is packet loss due to router congestion
forces the End Systems to decrease the rate at which packets are sentdecrease the rate at which packets are sent during periods of congestion
The sending systemrsquos packet is lost due to congestion and is alerted when it stops receiving ACKs of packets sent
Transport Layer ndash TCP
49
BBTransport Layer 3-49
Macroscopic Description of TCP throughput
whatrsquos the average throughout of TCP as a function of window size and RTT ignore slow start (typically very short phases)
let W be the window size when loss occurs when window is W throughput is WRTT just after loss window drops to W2 throughput to
W2RTT Throughput increases linearly (by MSSRTT every
RTT) Average Throughput 75 WRTT
(Based on Idealised model for the steady-state dynamics of TCP)
Transport Layer 3-50
TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired
throughput of 10 Gbps requires window size W = 83333 in-flight
segments Throughput in terms of loss rate
L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)
new versions of TCP is needed for high-speed environments
LRTT
MSS221
Transport Layer ndash TCP
51
BB
TCP FairnessTCP Fairness
Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth
TCP connection 1
bottleneckrouter
capacity RR
TCP connection 2
Go to Summary of TCP Congestion Control
Transport Layer ndash TCP
52
BB
Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link
Link with transmission rate of RR
Each connection have the same MSSMSS RTTRTT
No other TCP connections or UDP datagrams traverse the shared link
Ignore slow start phase of TCP
Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)
Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing
AssumptionsAssumptions
Transport Layer ndash TCP
53
BB
Why is TCP fairWhy is TCP fair
Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput
proportionally
RR
equal bandwidth shareequal bandwidth share
Connection 1 throughput
Connect
ion 2
thro
ughput
congestion avoidance additive increaseloss decrease window by factor of 2
RR
A point on the graph depicts the amount of link bandwidth
jointly jointly consumedconsumed by
the connections
Full bandwidth utilisation line
We can viewWe can view a simulation a simulation
on thison this
View SimulationView Simulation
Transport Layer ndash TCP
54
BB
The End
The next succeeding slides are just for additional reading
Transport Layer ndash TCP
55
BB
TCP Latency ModelingTCP Latency Modeling
In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs
Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth
Multiple End Systems sharing a link
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
Loop holes in TCPLoop holes in TCP
1 TCP connection
1 TCP connection
1 TCP connection
3 TCP connections
Multithreading implementationMultithreading implementation
Transport Layer ndash TCP
56
BB
TCP latency modelingTCP latency modeling
TCP connection establishment time data transfer delay Actual data transmission time
Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent
QQ How long How long does it take to does it take to
receive an object receive an object from a Web from a Web
serverserver
No data transfer delayNo data transfer delay
Therersquos data transfer delayTherersquos data transfer delay
the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety
Transport Layer ndash TCP
57
BB
TCP Latency ModelingTCP Latency Modeling
Network is uncongested with one link between end systems of rate RR
CongWinCongWin (fixed) determines the amount of data that can be sent
No packet loss no packet corruption no retransmissions required
Header overheads are negligible
File to sendFile to send = integer number of segments of size MSS
Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times
CLIENT SERVER
FILEFILE
Initial ThresholdThreshold of TCP congestion mechanism is very big
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits
SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)
Transport Layer ndash TCP
58
BB
TCP latency ModelingTCP latency Modeling
Case 1 latency = 2RTT + OR
K = Number of Windows of data that cover the object
K = OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
Number of segmentsRounded up to the nearest integer
Assume W=4 segmentsAssume W=4 segments
eg O=256bits S=32bits W=4
Transport Layer ndash TCP
59
BB
TCP latency ModelingTCP latency Modeling
Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]
Number of Windows of data that cover the objectK= OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR
Sender has to wait for an ACK after a windowrsquos worth of data sent
STALLED STALLED PERIODPERIOD
If there are k windows sender willbe stalled (k-1) times
Transport Layer ndash TCP
60
BB
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
STALLED STALLED PERIODPERIOD
4 windows4 windows
OS=15
Transport Layer ndash TCP
61
BB
bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the
object as followsobject as follows
S
OkK k 110 222min
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1log
1logmin
12min
2
2
S
OK
S
OkkK
S
OkK k
Note
Transport Layer ndash TCP
62
BB
bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window
bull Transmission of Transmission of kkth window window ==
bull Stall Time Stall Time ==
bull Latency Latency ==
12
k
R
S
12
k
R
SRTT
R
S
RTTR
S
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1K
1
122k
k
R
SRTT
R
S
R
ORTT
Transport Layer ndash TCP
63
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
Transport Layer ndash TCP
64
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
R
S
R
SRTTP
R
ORTT P )12(2Latency
bull Closed-form expression for the latency
Transport Layer ndash TCP
65
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
2
1
RTTRO
P
encyMinimumLat
Latency
Slow start will not significantly increase latency if RTT ltlt OR
Transport Layer ndash TCP
66
BB
httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features
Transport Layer ndash TCP
15
BB
RTT Samples and RTT estimatesRTT Samples and RTT estimates
300
250
200
150
100 time
Estimated RTT
Sample RTT
RT
T (
mse
c)
The variations in the The variations in the SampleRTT SampleRTT are are smoothed out in the computation of thesmoothed out in the computation of the
EstimatedRTTEstimatedRTT
Transport Layer ndash TCP
16
BB
An Actual RTT estimationAn Actual RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RT
T (m
illis
eco
nds)
SampleRTT Estimated RTT
Transport Layer ndash TCP
17
BB
Simplified TCP sender assuming
waitfor
event
waitfor
event
event data received from application above
event timer timeout for segment with seq number y
event ACK receivedwith ACK number y
create send segment
retransmit segment
process ACK
- one way data transfer- no flow congestion control
FSM of TCP for Reliable Data Transfer FSM of TCP for Reliable Data Transfer
Transport Layer ndash TCP
18
BB
00 sendbase = initial_sequence number 01 nextseqnum = initial_sequence number 0203 loop (forever) 04 switch(event) 05 event data received from application above 06 create TCP segment with sequence number nextseqnum 07 If (timer is currently not running) start timer for segment
nextseqnum 08 pass segment to IP 09 nextseqnum = nextseqnum + length(data) 10 event timer timeout 11 retransmit not-yet-ACKed segment with smallest Seq 12 Start timer13 event ACK received with ACK field value of y 15 if (y gt sendbase) cumulative ACK of all data up to y 16 sendbase = y 17 If (there are currently any not-yet-ACKed segments)18 start timer19 20 end of loop forever
Associated with the oldest unACKed
segment
SIMPLIFIED TCPSIMPLIFIED TCP SENDER
AssumptionsAssumptions bull sender is not constrained by TCP flow or congestion controlbull that data from above is less than MSS in sizebull that data transfer is in one direction only
Transport Layer ndash TCP
20
BB
00 sendbase = initial_sequence number 01 nextseqnum = initial_sequence number 0203 loop (forever) 04 switch(event) 05 event data received from application above 06 create TCP segment with sequence number nextseqnum 07 start timer for segment nextseqnum 08 pass segment to IP 09 nextseqnum = nextseqnum + length(data) 10 event timer timeout for segment with sequence number y 11 retransmit segment with sequence number y 12 compute new timeout interval for segment y 13 restart timer for sequence number y 14 event ACK received with ACK field value of y 15 if (y gt sendbase) cumulative ACK of all data up to y 16 cancel all timers for segments with sequence numbers lt y 17 sendbase = y 18 19 else a duplicate ACK for already ACKed segment 20 increment number of duplicate ACKs received for y 21 if (number of duplicate ACKS received for y is 3) 22 perform TCP fast retransmission 23 resend segment with sequence number y 24 restart timer for segment y 25 26 end of loop forever
TCPTCP with MODIFICATIONSwith MODIFICATIONS SENDER
Why wait for the timeout to expire when consecutive
ACKs can be used to indicate a lost segment
With Fast Retransmit
Transport Layer ndash TCP
21
BB
TCP ACK generation [RFC 1122 RFC 2581]
Event
in-order segment arrival no gapseverything else already ACKed
in-order segment arrival no gaps one delayed ACK pending (due to action 1)
out-of-order segment arrivalwith higher than expect seq - a gap is detected
arrival of segment that partially or completely fills gap
TCP Receiver action
Delay sending the ACK Wait up to 500msfor next segment If next segment does not arrive in this interval send ACK
immediately send a singlecumulative ACK
send duplicate ACK indicating seq of next expected byte
Immediately send an ACK if segment startsat lower end of gap
1
2
3
4
Receiver does not discard out-of-order segmentsReceiver does not discard out-of-order segments
Transport Layer ndash TCP
22
BB
TCP Interesting Scenarios
Host A
Seq=92 8 bytes data
ACK=100
loss
tim
eout
time lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
Host A
Seq=100 20 bytes data
ACK=100
Seq=
92
tim
eout
time premature timeoutcumulative ACKs
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
ACK=120
Retransmission due to lost ACKSegment with Seq=100 not retransmitted
Timer is restarted here for Seq=92
Simplified TCP versionSimplified TCP version
Transport Layer ndash TCP
23
BB
Host A
Seq=100 20 bytes data
ACK=100Seq=
92
tim
eout
time
Host B
ACK=120
Seq=92 8 bytes data
Xloss
Cumulative ACK avoids retransmission of the first segment
TCP Retransmission ScenarioTCP Retransmission Scenario
Transport Layer ndash TCP
24
BB
TCP Modifications Doubling the Timeout IntervalDoubling the Timeout Interval
Provides a limited form of congestion control
Timer expiration is more likely caused by congestion in the network
TimeoutInterval = 2 TimeoutIntervalPrevious
TCP acts more politely by increasing the TimeoutInterval causing the sender to retransmit after longer and longer intervals
Congestion may get worse if sources
continue to retransmit packets
persistently
After ACK is received After ACK is received TimeoutInterval is derived TimeoutInterval is derived from most recent EstimatedRTT from most recent EstimatedRTT and DevRTTand DevRTT
Others check RFC 2018 ndash selective ACK
Transport Layer ndash TCP
25
BB
receiverreceiver explicitly informs sender of (dynamically changing) amount of free buffer space RcvWindowRcvWindow field
in TCP segment
sendersender keeps the amount of transmitted unACKed data less thanless than most recently received RcvWindowRcvWindow
sender wont overrun
receivers buffer bytransmitting too
much too fast
flow controlflow control
receiver buffering
RcvBufferRcvBuffer = size of TCP Receive Buffer
RcvWindowRcvWindow = amount of spare room in Buffer
TCP Flow ControlTCP Flow Control
Transport Layer ndash TCP
26
BB
FLOW CONTROL FLOW CONTROL ReceiverReceiver
04060 50100
LastByteRcvd
EXAMPLE HOST A sends a large file to HOST B
LastByteRead
Application Process
Data from IP
RcvBuffer
RcvWindow=RcvBuffer-[LastByteRcvd-LastByteRead]
RECEIVER HOST B ndash uses RcvWindow LastByteRcvd LastByteReadRcvWindow LastByteRcvd LastByteRead
Initially RcvWindow = RcvBuffer Application reads from the bufferHOST BHOST B tells HOST AHOST A how much spare roomhow much spare room it has in the connection buffer by placing its current value of RcvWindowRcvWindow in the receive window field of every segment it sends to HOST AHOST A
Transport Layer ndash TCP
27
BB
FLOW CONTROL SenderSender
04060 50100
LastByteSent
EXAMPLE HOST A sends a large file to HOST B
LastByteACKed
Data
To ensure that HOST B does not overflow HOST A maintains throughout the connectionrsquos life that [LastByteSent-LastByteACKed] lt= RcvWindow
SENDER HOST A
ACKs ACKs from from Host Host BB
SENDER HOST A ndash uses RcvWindow RcvWindow of HostBof HostB LastByteSent LastByteACKed LastByteSent LastByteACKed
Transport Layer ndash TCP
28
BB
FLOW CONTROLFLOW CONTROLSome issue to consider
RcvWindowRcvWindow ndash used by the connection to provide the flow control serviceflow control service
TCPTCP requires that requires that HOST AHOST A continue to sendcontinue to send segments with one segments with one data byte when HOST Brsquos receive data byte when HOST Brsquos receive window is window is 00 Such segments will be Such segments will be ACKedACKed by HOST B by HOST B Eventually the buffer will have Eventually the buffer will have some space and the some space and the ACKsACKs will will contain contain RcvWindow gt 0RcvWindow gt 0
What happens when the receive
buffer of HOST B is full full (that is when (that is when
RcvWindow=0)RcvWindow=0)
TCP sends a segment only when there is datadata or ACKACK to send Therefore the sender must maintain the connection lsquoalivealiversquo
Transport Layer ndash TCP
29
BB
TCP Connection ManagementTCP Connection Management
Recall TCP sender receiver establish ldquoconnectionconnectionrdquo before exchanging data segments
Initialize TCP variables sequence numbers buffers flow control info (eg RcvWindow)
ClientClient is the connection initiator
In Java Socket clientSocket = new Socket(hostnameport number) connect ServerServer is contacted by client
In JavaSocket accept()
if (connectconnect(s (struct sockaddr )ampsin sizeof(sin)) = 0) printf(connect failedn) WSACleanup() exit(1)
ns = acceptaccept(s(struct sockaddr )(ampremoteaddr)ampaddrlen)
Transport Layer ndash TCP
30
BB
TCP Connection Management
Three way handshakeStep 1 clientclient end system sends TCP
SYNSYN control segment to server (executed by TCP itself)
specifies initial seq number (isn)
Step 2 serverserver end system receives SYNSYN replies with SYNACKSYNACK control segment
ACKs received SYN allocates buffers specifies serverrsquos initial seq
number
Step 3 clientclient ACKsACKs the connection with
ACK=server_isn +1ACK=server_isn +1
allocates buffers sends SYN=0SYN=0
Connection established
Client
AcceptAccept (SYN=1
seq=server_isnack=client_isn+1)
time
Server
ConnectConnect (SYN=1 seq=client_isn)
ACK (SYN=0
seq=client_isn+1ack=server_isn+1)
Establishing a connection
This is what happens when we create a socket for
connection to a server
After establishing the connection the client can receive segments with app-generated data (SYN=0)(SYN=0)
Transport Layer ndash TCP
31
BB
TCP Connection Management (cont)
Closing a connection
client closes socket
closesocket(s)closesocket(s)
Java clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
How TCP connection is established and torn down
Transport Layer ndash TCP
32
BB
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
Enters timed wait - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer ndash TCP
33
BB
TCP Connection Management (cont)
TCP client lifecycle
TCP server lifecycle
Used in case ACK gets lost It is implementation-dependent (eg 30
seconds 1 minute 2 minutes
Connection formally closes ndash all resources (eg port numbers) are
released
1
2
3
4
5
6
7
8
9
10
11
12
Transport Layer ndash TCP
34
BB
End of Flow Control and Error Control
Transport Layer ndash TCP
35
BB
Flow Control vs Congestion ControlFlow Control vs Congestion ControlSimilar actions are taken but for very different reasons
Flow ControlFlow Controlbull point-to-point traffic between sender and receiverbull speed matching service matching the rate at which the sender is sending against the rate at which the receiving application is readingbull prevents Receiver Buffer from overflowing
CongestionCongestion ndash happens when there are too many sources attempting to send data at too high a rate for the routers along the path
Congestion ControlCongestion Controlbull service that makes sure that the routers between End Systems are able to carry the offered trafficbull prevents routers from overflowing
Same course of action Throttling of the sender
Transport Layer ndash TCP
36
BB
CongestionCongestion Informally too many sources sending too
much data too fast for network to handle different from flow control Manifestations
lost packets (buffer overflow at routers) long delays (queuing in router buffers)
a top-10 problem
Principles of Congestion ControlPrinciples of Congestion Control
Transport Layer ndash TCP
37
BB
Approaches towards congestion control
End-to-end congestion control
no explicit feedback from network
congestion inferred by end-systems from observed packet loss amp delay
approach taken by TCP
Network-assisted congestion control
routers provide feedback to End Systems in the form of single bit indicating
link congestion (SNA DECbit TCPIP ECN ATM ABR)
explicit transmission rate the sender should send at
1 2
Two broad approaches towards congestion control
Transport Layer ndash TCP
38
BB
TCP Congestion ControlTCP Congestion ControlHow TCP sender limits the rate at which it sends traffic into its connection
SENDER
(Amount of unACKed data)SENDER lt min(CongWin RcvWindow)
LastByteSent - LastByteACKed
Indirectly limits the senderrsquos send rateAssumptions
bull TCP receive buffer is very large ndash no RcvWindow constraint Amt of unACKed data at sender is solely limited by CongWin
bull Packet loss delay amp packet transmission delay are negligible
Sending rate (approx) CongWinRTT
By adjusting CongWin sender can therefore adjust the
rate at which it sends data into its connection
New variable ndash Congestion
Window
Transport Layer ndash TCP
39
BB
TCP Congestion ControlTCP Congestion ControlTCP uses ACKs to trigger (ldquoclockrdquo) its increase in congestion window size ndash ldquoself-clockingrdquo
Arrival of ACKs ndash indication to the sender that all is well
1 Slow Rate
bull Congestion window will be increased at a relatively slow rate
2 High rate
bull Congestion window will be increased more quickly
Transport Layer ndash TCP
40
BB
TCP Congestion ControlTCP Congestion ControlHow TCP perceives that there is congestion on the path
ldquoLoss Eventrdquo ndash when there is excessive congestion router buffers along the path overflows causing datagrams to be dropped which in turn results in a ldquoloss eventrdquo at the sender
1 Timeout
bull no ACK is received after segment loss
2 Receipt of three duplicate ACKs
bull segment loss is followed by three ACKs received at the sender
Transport Layer ndash TCP
41
BB
TCP Congestion Control details
sender limits transmission LastByteSent-LastByteAcked
cwnd
roughly
cwnd is dynamic function of perceived network congestion
How does sender perceive congestion
loss event = timeout or 3 duplicate acks
TCP sender reduces rate (cwnd) after loss event
Three mechanisms1 AIMD
2 slow start
3 conservative after timeout events
rate = cwnd
RTT Bytessec
Transport Layer ndash TCP
42
BB
TCP congestion avoidance congestion avoidance additive increase multiplicative decreaseadditive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every RTT until
loss is detected multiplicative decrease cut cwnd in half after loss
timecwnd
co
nge
stio
n w
indo
w s
ize
saw toothbehavior probingfor bandwidth
Transport Layer ndash TCP
43
BB
TCP Slow Start Slow Start when connection begins
increase rate exponentially until first loss event initially cwndcwnd = 1 MSS double cwndcwnd every RTT done by incrementing cwndcwnd by 1 MSS by 1 MSS for every ACK received
summary initial rate is slow but ramps up exponentially fast (doubling of the sending rate every RTT)
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transport Layer ndash TCP
44
BB
Refinement inferring lossRefinement inferring loss
after 3 dup ACKsafter 3 dup ACKs cwndcwnd is cut in halfhalf window then grows
linearlylinearly butbut after timeout after timeout
eventevent cwndcwnd is is set to 1 MSS1 MSS window then grows
exponentiallyexponentially Up to a thresholdUp to a threshold
then grows linearlylinearly
3 dup ACKs 3 dup ACKs indicates network capable of delivering some segments
timeouttimeout indicates a ldquomore alarmingrdquo congestion scenario
Philosophy
Transport Layer ndash TCP
45
BB
Refinement
Q when should the exponential increase switch to linear
A when cwndcwnd gets to 1212 of its value before timeout
Implementation variable ssthresh (slow-start threshold)ssthresh (slow-start threshold) on loss eventon loss event ssthreshssthresh is set to 1212 of cwndcwnd just
before loss event
Transport Layer ndash TCP
46
BB
TCP Sender Congestion ControlSTATE EVENT TCP SENDER Congestion-
Control ActionCommentary
SLOW START (SS)
ACK receipt for previously unACKed data
CongWin = CongWin + MSSIf(CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo
Resulting in a doubling of CongWin every RTT
Congestion Avoidance (CA)
ACK receipt for previously unACKed data
CongWin = CongWin + MSS (MSSCongWin)
Additive increase resulting in increasing of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate ACK
Threshold = CongWin 2CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
SS or CA Timeout Threshold = CongWin 2CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter Slow Start
SS or CA Duplicate ACK Increment duplicate ACK count for segment being ACKed
CongWin and Threshold not changed
Transport Layer ndash TCP
47
BB
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3 MSSretransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++
duplicate ACK
cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
Transport Layer ndash TCP
48
BB
Congestion control
TCPrsquos Congestion Control Service
CLIENTSERVER
Problem Gridlock sets-in when there is packet loss due to router congestion
forces the End Systems to decrease the rate at which packets are sentdecrease the rate at which packets are sent during periods of congestion
The sending systemrsquos packet is lost due to congestion and is alerted when it stops receiving ACKs of packets sent
Transport Layer ndash TCP
49
BBTransport Layer 3-49
Macroscopic Description of TCP throughput
whatrsquos the average throughout of TCP as a function of window size and RTT ignore slow start (typically very short phases)
let W be the window size when loss occurs when window is W throughput is WRTT just after loss window drops to W2 throughput to
W2RTT Throughput increases linearly (by MSSRTT every
RTT) Average Throughput 75 WRTT
(Based on Idealised model for the steady-state dynamics of TCP)
Transport Layer 3-50
TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired
throughput of 10 Gbps requires window size W = 83333 in-flight
segments Throughput in terms of loss rate
L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)
new versions of TCP is needed for high-speed environments
LRTT
MSS221
Transport Layer ndash TCP
51
BB
TCP FairnessTCP Fairness
Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth
TCP connection 1
bottleneckrouter
capacity RR
TCP connection 2
Go to Summary of TCP Congestion Control
Transport Layer ndash TCP
52
BB
Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link
Link with transmission rate of RR
Each connection have the same MSSMSS RTTRTT
No other TCP connections or UDP datagrams traverse the shared link
Ignore slow start phase of TCP
Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)
Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing
AssumptionsAssumptions
Transport Layer ndash TCP
53
BB
Why is TCP fairWhy is TCP fair
Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput
proportionally
RR
equal bandwidth shareequal bandwidth share
Connection 1 throughput
Connect
ion 2
thro
ughput
congestion avoidance additive increaseloss decrease window by factor of 2
RR
A point on the graph depicts the amount of link bandwidth
jointly jointly consumedconsumed by
the connections
Full bandwidth utilisation line
We can viewWe can view a simulation a simulation
on thison this
View SimulationView Simulation
Transport Layer ndash TCP
54
BB
The End
The next succeeding slides are just for additional reading
Transport Layer ndash TCP
55
BB
TCP Latency ModelingTCP Latency Modeling
In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs
Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth
Multiple End Systems sharing a link
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
Loop holes in TCPLoop holes in TCP
1 TCP connection
1 TCP connection
1 TCP connection
3 TCP connections
Multithreading implementationMultithreading implementation
Transport Layer ndash TCP
56
BB
TCP latency modelingTCP latency modeling
TCP connection establishment time data transfer delay Actual data transmission time
Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent
QQ How long How long does it take to does it take to
receive an object receive an object from a Web from a Web
serverserver
No data transfer delayNo data transfer delay
Therersquos data transfer delayTherersquos data transfer delay
the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety
Transport Layer ndash TCP
57
BB
TCP Latency ModelingTCP Latency Modeling
Network is uncongested with one link between end systems of rate RR
CongWinCongWin (fixed) determines the amount of data that can be sent
No packet loss no packet corruption no retransmissions required
Header overheads are negligible
File to sendFile to send = integer number of segments of size MSS
Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times
CLIENT SERVER
FILEFILE
Initial ThresholdThreshold of TCP congestion mechanism is very big
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits
SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)
Transport Layer ndash TCP
58
BB
TCP latency ModelingTCP latency Modeling
Case 1 latency = 2RTT + OR
K = Number of Windows of data that cover the object
K = OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
Number of segmentsRounded up to the nearest integer
Assume W=4 segmentsAssume W=4 segments
eg O=256bits S=32bits W=4
Transport Layer ndash TCP
59
BB
TCP latency ModelingTCP latency Modeling
Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]
Number of Windows of data that cover the objectK= OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR
Sender has to wait for an ACK after a windowrsquos worth of data sent
STALLED STALLED PERIODPERIOD
If there are k windows sender willbe stalled (k-1) times
Transport Layer ndash TCP
60
BB
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
STALLED STALLED PERIODPERIOD
4 windows4 windows
OS=15
Transport Layer ndash TCP
61
BB
bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the
object as followsobject as follows
S
OkK k 110 222min
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1log
1logmin
12min
2
2
S
OK
S
OkkK
S
OkK k
Note
Transport Layer ndash TCP
62
BB
bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window
bull Transmission of Transmission of kkth window window ==
bull Stall Time Stall Time ==
bull Latency Latency ==
12
k
R
S
12
k
R
SRTT
R
S
RTTR
S
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1K
1
122k
k
R
SRTT
R
S
R
ORTT
Transport Layer ndash TCP
63
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
Transport Layer ndash TCP
64
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
R
S
R
SRTTP
R
ORTT P )12(2Latency
bull Closed-form expression for the latency
Transport Layer ndash TCP
65
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
2
1
RTTRO
P
encyMinimumLat
Latency
Slow start will not significantly increase latency if RTT ltlt OR
Transport Layer ndash TCP
66
BB
httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features
Transport Layer ndash TCP
16
BB
An Actual RTT estimationAn Actual RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RT
T (m
illis
eco
nds)
SampleRTT Estimated RTT
Transport Layer ndash TCP
17
BB
Simplified TCP sender assuming
waitfor
event
waitfor
event
event data received from application above
event timer timeout for segment with seq number y
event ACK receivedwith ACK number y
create send segment
retransmit segment
process ACK
- one way data transfer- no flow congestion control
FSM of TCP for Reliable Data Transfer FSM of TCP for Reliable Data Transfer
Transport Layer ndash TCP
18
BB
00 sendbase = initial_sequence number 01 nextseqnum = initial_sequence number 0203 loop (forever) 04 switch(event) 05 event data received from application above 06 create TCP segment with sequence number nextseqnum 07 If (timer is currently not running) start timer for segment
nextseqnum 08 pass segment to IP 09 nextseqnum = nextseqnum + length(data) 10 event timer timeout 11 retransmit not-yet-ACKed segment with smallest Seq 12 Start timer13 event ACK received with ACK field value of y 15 if (y gt sendbase) cumulative ACK of all data up to y 16 sendbase = y 17 If (there are currently any not-yet-ACKed segments)18 start timer19 20 end of loop forever
Associated with the oldest unACKed
segment
SIMPLIFIED TCPSIMPLIFIED TCP SENDER
AssumptionsAssumptions bull sender is not constrained by TCP flow or congestion controlbull that data from above is less than MSS in sizebull that data transfer is in one direction only
Transport Layer ndash TCP
20
BB
00 sendbase = initial_sequence number 01 nextseqnum = initial_sequence number 0203 loop (forever) 04 switch(event) 05 event data received from application above 06 create TCP segment with sequence number nextseqnum 07 start timer for segment nextseqnum 08 pass segment to IP 09 nextseqnum = nextseqnum + length(data) 10 event timer timeout for segment with sequence number y 11 retransmit segment with sequence number y 12 compute new timeout interval for segment y 13 restart timer for sequence number y 14 event ACK received with ACK field value of y 15 if (y gt sendbase) cumulative ACK of all data up to y 16 cancel all timers for segments with sequence numbers lt y 17 sendbase = y 18 19 else a duplicate ACK for already ACKed segment 20 increment number of duplicate ACKs received for y 21 if (number of duplicate ACKS received for y is 3) 22 perform TCP fast retransmission 23 resend segment with sequence number y 24 restart timer for segment y 25 26 end of loop forever
TCPTCP with MODIFICATIONSwith MODIFICATIONS SENDER
Why wait for the timeout to expire when consecutive
ACKs can be used to indicate a lost segment
With Fast Retransmit
Transport Layer ndash TCP
21
BB
TCP ACK generation [RFC 1122 RFC 2581]
Event
in-order segment arrival no gapseverything else already ACKed
in-order segment arrival no gaps one delayed ACK pending (due to action 1)
out-of-order segment arrivalwith higher than expect seq - a gap is detected
arrival of segment that partially or completely fills gap
TCP Receiver action
Delay sending the ACK Wait up to 500msfor next segment If next segment does not arrive in this interval send ACK
immediately send a singlecumulative ACK
send duplicate ACK indicating seq of next expected byte
Immediately send an ACK if segment startsat lower end of gap
1
2
3
4
Receiver does not discard out-of-order segmentsReceiver does not discard out-of-order segments
Transport Layer ndash TCP
22
BB
TCP Interesting Scenarios
Host A
Seq=92 8 bytes data
ACK=100
loss
tim
eout
time lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
Host A
Seq=100 20 bytes data
ACK=100
Seq=
92
tim
eout
time premature timeoutcumulative ACKs
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
ACK=120
Retransmission due to lost ACKSegment with Seq=100 not retransmitted
Timer is restarted here for Seq=92
Simplified TCP versionSimplified TCP version
Transport Layer ndash TCP
23
BB
Host A
Seq=100 20 bytes data
ACK=100Seq=
92
tim
eout
time
Host B
ACK=120
Seq=92 8 bytes data
Xloss
Cumulative ACK avoids retransmission of the first segment
TCP Retransmission ScenarioTCP Retransmission Scenario
Transport Layer ndash TCP
24
BB
TCP Modifications Doubling the Timeout IntervalDoubling the Timeout Interval
Provides a limited form of congestion control
Timer expiration is more likely caused by congestion in the network
TimeoutInterval = 2 TimeoutIntervalPrevious
TCP acts more politely by increasing the TimeoutInterval causing the sender to retransmit after longer and longer intervals
Congestion may get worse if sources
continue to retransmit packets
persistently
After ACK is received After ACK is received TimeoutInterval is derived TimeoutInterval is derived from most recent EstimatedRTT from most recent EstimatedRTT and DevRTTand DevRTT
Others check RFC 2018 ndash selective ACK
Transport Layer ndash TCP
25
BB
receiverreceiver explicitly informs sender of (dynamically changing) amount of free buffer space RcvWindowRcvWindow field
in TCP segment
sendersender keeps the amount of transmitted unACKed data less thanless than most recently received RcvWindowRcvWindow
sender wont overrun
receivers buffer bytransmitting too
much too fast
flow controlflow control
receiver buffering
RcvBufferRcvBuffer = size of TCP Receive Buffer
RcvWindowRcvWindow = amount of spare room in Buffer
TCP Flow ControlTCP Flow Control
Transport Layer ndash TCP
26
BB
FLOW CONTROL FLOW CONTROL ReceiverReceiver
04060 50100
LastByteRcvd
EXAMPLE HOST A sends a large file to HOST B
LastByteRead
Application Process
Data from IP
RcvBuffer
RcvWindow=RcvBuffer-[LastByteRcvd-LastByteRead]
RECEIVER HOST B ndash uses RcvWindow LastByteRcvd LastByteReadRcvWindow LastByteRcvd LastByteRead
Initially RcvWindow = RcvBuffer Application reads from the bufferHOST BHOST B tells HOST AHOST A how much spare roomhow much spare room it has in the connection buffer by placing its current value of RcvWindowRcvWindow in the receive window field of every segment it sends to HOST AHOST A
Transport Layer ndash TCP
27
BB
FLOW CONTROL SenderSender
04060 50100
LastByteSent
EXAMPLE HOST A sends a large file to HOST B
LastByteACKed
Data
To ensure that HOST B does not overflow HOST A maintains throughout the connectionrsquos life that [LastByteSent-LastByteACKed] lt= RcvWindow
SENDER HOST A
ACKs ACKs from from Host Host BB
SENDER HOST A ndash uses RcvWindow RcvWindow of HostBof HostB LastByteSent LastByteACKed LastByteSent LastByteACKed
Transport Layer ndash TCP
28
BB
FLOW CONTROLFLOW CONTROLSome issue to consider
RcvWindowRcvWindow ndash used by the connection to provide the flow control serviceflow control service
TCPTCP requires that requires that HOST AHOST A continue to sendcontinue to send segments with one segments with one data byte when HOST Brsquos receive data byte when HOST Brsquos receive window is window is 00 Such segments will be Such segments will be ACKedACKed by HOST B by HOST B Eventually the buffer will have Eventually the buffer will have some space and the some space and the ACKsACKs will will contain contain RcvWindow gt 0RcvWindow gt 0
What happens when the receive
buffer of HOST B is full full (that is when (that is when
RcvWindow=0)RcvWindow=0)
TCP sends a segment only when there is datadata or ACKACK to send Therefore the sender must maintain the connection lsquoalivealiversquo
Transport Layer ndash TCP
29
BB
TCP Connection ManagementTCP Connection Management
Recall TCP sender receiver establish ldquoconnectionconnectionrdquo before exchanging data segments
Initialize TCP variables sequence numbers buffers flow control info (eg RcvWindow)
ClientClient is the connection initiator
In Java Socket clientSocket = new Socket(hostnameport number) connect ServerServer is contacted by client
In JavaSocket accept()
if (connectconnect(s (struct sockaddr )ampsin sizeof(sin)) = 0) printf(connect failedn) WSACleanup() exit(1)
ns = acceptaccept(s(struct sockaddr )(ampremoteaddr)ampaddrlen)
Transport Layer ndash TCP
30
BB
TCP Connection Management
Three way handshakeStep 1 clientclient end system sends TCP
SYNSYN control segment to server (executed by TCP itself)
specifies initial seq number (isn)
Step 2 serverserver end system receives SYNSYN replies with SYNACKSYNACK control segment
ACKs received SYN allocates buffers specifies serverrsquos initial seq
number
Step 3 clientclient ACKsACKs the connection with
ACK=server_isn +1ACK=server_isn +1
allocates buffers sends SYN=0SYN=0
Connection established
Client
AcceptAccept (SYN=1
seq=server_isnack=client_isn+1)
time
Server
ConnectConnect (SYN=1 seq=client_isn)
ACK (SYN=0
seq=client_isn+1ack=server_isn+1)
Establishing a connection
This is what happens when we create a socket for
connection to a server
After establishing the connection the client can receive segments with app-generated data (SYN=0)(SYN=0)
Transport Layer ndash TCP
31
BB
TCP Connection Management (cont)
Closing a connection
client closes socket
closesocket(s)closesocket(s)
Java clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
How TCP connection is established and torn down
Transport Layer ndash TCP
32
BB
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
Enters timed wait - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer ndash TCP
33
BB
TCP Connection Management (cont)
TCP client lifecycle
TCP server lifecycle
Used in case ACK gets lost It is implementation-dependent (eg 30
seconds 1 minute 2 minutes
Connection formally closes ndash all resources (eg port numbers) are
released
1
2
3
4
5
6
7
8
9
10
11
12
Transport Layer ndash TCP
34
BB
End of Flow Control and Error Control
Transport Layer ndash TCP
35
BB
Flow Control vs Congestion ControlFlow Control vs Congestion ControlSimilar actions are taken but for very different reasons
Flow ControlFlow Controlbull point-to-point traffic between sender and receiverbull speed matching service matching the rate at which the sender is sending against the rate at which the receiving application is readingbull prevents Receiver Buffer from overflowing
CongestionCongestion ndash happens when there are too many sources attempting to send data at too high a rate for the routers along the path
Congestion ControlCongestion Controlbull service that makes sure that the routers between End Systems are able to carry the offered trafficbull prevents routers from overflowing
Same course of action Throttling of the sender
Transport Layer ndash TCP
36
BB
CongestionCongestion Informally too many sources sending too
much data too fast for network to handle different from flow control Manifestations
lost packets (buffer overflow at routers) long delays (queuing in router buffers)
a top-10 problem
Principles of Congestion ControlPrinciples of Congestion Control
Transport Layer ndash TCP
37
BB
Approaches towards congestion control
End-to-end congestion control
no explicit feedback from network
congestion inferred by end-systems from observed packet loss amp delay
approach taken by TCP
Network-assisted congestion control
routers provide feedback to End Systems in the form of single bit indicating
link congestion (SNA DECbit TCPIP ECN ATM ABR)
explicit transmission rate the sender should send at
1 2
Two broad approaches towards congestion control
Transport Layer ndash TCP
38
BB
TCP Congestion ControlTCP Congestion ControlHow TCP sender limits the rate at which it sends traffic into its connection
SENDER
(Amount of unACKed data)SENDER lt min(CongWin RcvWindow)
LastByteSent - LastByteACKed
Indirectly limits the senderrsquos send rateAssumptions
bull TCP receive buffer is very large ndash no RcvWindow constraint Amt of unACKed data at sender is solely limited by CongWin
bull Packet loss delay amp packet transmission delay are negligible
Sending rate (approx) CongWinRTT
By adjusting CongWin sender can therefore adjust the
rate at which it sends data into its connection
New variable ndash Congestion
Window
Transport Layer ndash TCP
39
BB
TCP Congestion ControlTCP Congestion ControlTCP uses ACKs to trigger (ldquoclockrdquo) its increase in congestion window size ndash ldquoself-clockingrdquo
Arrival of ACKs ndash indication to the sender that all is well
1 Slow Rate
bull Congestion window will be increased at a relatively slow rate
2 High rate
bull Congestion window will be increased more quickly
Transport Layer ndash TCP
40
BB
TCP Congestion ControlTCP Congestion ControlHow TCP perceives that there is congestion on the path
ldquoLoss Eventrdquo ndash when there is excessive congestion router buffers along the path overflows causing datagrams to be dropped which in turn results in a ldquoloss eventrdquo at the sender
1 Timeout
bull no ACK is received after segment loss
2 Receipt of three duplicate ACKs
bull segment loss is followed by three ACKs received at the sender
Transport Layer ndash TCP
41
BB
TCP Congestion Control details
sender limits transmission LastByteSent-LastByteAcked
cwnd
roughly
cwnd is dynamic function of perceived network congestion
How does sender perceive congestion
loss event = timeout or 3 duplicate acks
TCP sender reduces rate (cwnd) after loss event
Three mechanisms1 AIMD
2 slow start
3 conservative after timeout events
rate = cwnd
RTT Bytessec
Transport Layer ndash TCP
42
BB
TCP congestion avoidance congestion avoidance additive increase multiplicative decreaseadditive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every RTT until
loss is detected multiplicative decrease cut cwnd in half after loss
timecwnd
co
nge
stio
n w
indo
w s
ize
saw toothbehavior probingfor bandwidth
Transport Layer ndash TCP
43
BB
TCP Slow Start Slow Start when connection begins
increase rate exponentially until first loss event initially cwndcwnd = 1 MSS double cwndcwnd every RTT done by incrementing cwndcwnd by 1 MSS by 1 MSS for every ACK received
summary initial rate is slow but ramps up exponentially fast (doubling of the sending rate every RTT)
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transport Layer ndash TCP
44
BB
Refinement inferring lossRefinement inferring loss
after 3 dup ACKsafter 3 dup ACKs cwndcwnd is cut in halfhalf window then grows
linearlylinearly butbut after timeout after timeout
eventevent cwndcwnd is is set to 1 MSS1 MSS window then grows
exponentiallyexponentially Up to a thresholdUp to a threshold
then grows linearlylinearly
3 dup ACKs 3 dup ACKs indicates network capable of delivering some segments
timeouttimeout indicates a ldquomore alarmingrdquo congestion scenario
Philosophy
Transport Layer ndash TCP
45
BB
Refinement
Q when should the exponential increase switch to linear
A when cwndcwnd gets to 1212 of its value before timeout
Implementation variable ssthresh (slow-start threshold)ssthresh (slow-start threshold) on loss eventon loss event ssthreshssthresh is set to 1212 of cwndcwnd just
before loss event
Transport Layer ndash TCP
46
BB
TCP Sender Congestion ControlSTATE EVENT TCP SENDER Congestion-
Control ActionCommentary
SLOW START (SS)
ACK receipt for previously unACKed data
CongWin = CongWin + MSSIf(CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo
Resulting in a doubling of CongWin every RTT
Congestion Avoidance (CA)
ACK receipt for previously unACKed data
CongWin = CongWin + MSS (MSSCongWin)
Additive increase resulting in increasing of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate ACK
Threshold = CongWin 2CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
SS or CA Timeout Threshold = CongWin 2CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter Slow Start
SS or CA Duplicate ACK Increment duplicate ACK count for segment being ACKed
CongWin and Threshold not changed
Transport Layer ndash TCP
47
BB
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3 MSSretransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++
duplicate ACK
cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
Transport Layer ndash TCP
48
BB
Congestion control
TCPrsquos Congestion Control Service
CLIENTSERVER
Problem Gridlock sets-in when there is packet loss due to router congestion
forces the End Systems to decrease the rate at which packets are sentdecrease the rate at which packets are sent during periods of congestion
The sending systemrsquos packet is lost due to congestion and is alerted when it stops receiving ACKs of packets sent
Transport Layer ndash TCP
49
BBTransport Layer 3-49
Macroscopic Description of TCP throughput
whatrsquos the average throughout of TCP as a function of window size and RTT ignore slow start (typically very short phases)
let W be the window size when loss occurs when window is W throughput is WRTT just after loss window drops to W2 throughput to
W2RTT Throughput increases linearly (by MSSRTT every
RTT) Average Throughput 75 WRTT
(Based on Idealised model for the steady-state dynamics of TCP)
Transport Layer 3-50
TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired
throughput of 10 Gbps requires window size W = 83333 in-flight
segments Throughput in terms of loss rate
L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)
new versions of TCP is needed for high-speed environments
LRTT
MSS221
Transport Layer ndash TCP
51
BB
TCP FairnessTCP Fairness
Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth
TCP connection 1
bottleneckrouter
capacity RR
TCP connection 2
Go to Summary of TCP Congestion Control
Transport Layer ndash TCP
52
BB
Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link
Link with transmission rate of RR
Each connection have the same MSSMSS RTTRTT
No other TCP connections or UDP datagrams traverse the shared link
Ignore slow start phase of TCP
Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)
Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing
AssumptionsAssumptions
Transport Layer ndash TCP
53
BB
Why is TCP fairWhy is TCP fair
Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput
proportionally
RR
equal bandwidth shareequal bandwidth share
Connection 1 throughput
Connect
ion 2
thro
ughput
congestion avoidance additive increaseloss decrease window by factor of 2
RR
A point on the graph depicts the amount of link bandwidth
jointly jointly consumedconsumed by
the connections
Full bandwidth utilisation line
We can viewWe can view a simulation a simulation
on thison this
View SimulationView Simulation
Transport Layer ndash TCP
54
BB
The End
The next succeeding slides are just for additional reading
Transport Layer ndash TCP
55
BB
TCP Latency ModelingTCP Latency Modeling
In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs
Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth
Multiple End Systems sharing a link
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
Loop holes in TCPLoop holes in TCP
1 TCP connection
1 TCP connection
1 TCP connection
3 TCP connections
Multithreading implementationMultithreading implementation
Transport Layer ndash TCP
56
BB
TCP latency modelingTCP latency modeling
TCP connection establishment time data transfer delay Actual data transmission time
Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent
QQ How long How long does it take to does it take to
receive an object receive an object from a Web from a Web
serverserver
No data transfer delayNo data transfer delay
Therersquos data transfer delayTherersquos data transfer delay
the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety
Transport Layer ndash TCP
57
BB
TCP Latency ModelingTCP Latency Modeling
Network is uncongested with one link between end systems of rate RR
CongWinCongWin (fixed) determines the amount of data that can be sent
No packet loss no packet corruption no retransmissions required
Header overheads are negligible
File to sendFile to send = integer number of segments of size MSS
Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times
CLIENT SERVER
FILEFILE
Initial ThresholdThreshold of TCP congestion mechanism is very big
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits
SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)
Transport Layer ndash TCP
58
BB
TCP latency ModelingTCP latency Modeling
Case 1 latency = 2RTT + OR
K = Number of Windows of data that cover the object
K = OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
Number of segmentsRounded up to the nearest integer
Assume W=4 segmentsAssume W=4 segments
eg O=256bits S=32bits W=4
Transport Layer ndash TCP
59
BB
TCP latency ModelingTCP latency Modeling
Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]
Number of Windows of data that cover the objectK= OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR
Sender has to wait for an ACK after a windowrsquos worth of data sent
STALLED STALLED PERIODPERIOD
If there are k windows sender willbe stalled (k-1) times
Transport Layer ndash TCP
60
BB
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
STALLED STALLED PERIODPERIOD
4 windows4 windows
OS=15
Transport Layer ndash TCP
61
BB
bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the
object as followsobject as follows
S
OkK k 110 222min
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1log
1logmin
12min
2
2
S
OK
S
OkkK
S
OkK k
Note
Transport Layer ndash TCP
62
BB
bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window
bull Transmission of Transmission of kkth window window ==
bull Stall Time Stall Time ==
bull Latency Latency ==
12
k
R
S
12
k
R
SRTT
R
S
RTTR
S
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1K
1
122k
k
R
SRTT
R
S
R
ORTT
Transport Layer ndash TCP
63
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
Transport Layer ndash TCP
64
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
R
S
R
SRTTP
R
ORTT P )12(2Latency
bull Closed-form expression for the latency
Transport Layer ndash TCP
65
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
2
1
RTTRO
P
encyMinimumLat
Latency
Slow start will not significantly increase latency if RTT ltlt OR
Transport Layer ndash TCP
66
BB
httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features
Transport Layer ndash TCP
17
BB
Simplified TCP sender assuming
waitfor
event
waitfor
event
event data received from application above
event timer timeout for segment with seq number y
event ACK receivedwith ACK number y
create send segment
retransmit segment
process ACK
- one way data transfer- no flow congestion control
FSM of TCP for Reliable Data Transfer FSM of TCP for Reliable Data Transfer
Transport Layer ndash TCP
18
BB
00 sendbase = initial_sequence number 01 nextseqnum = initial_sequence number 0203 loop (forever) 04 switch(event) 05 event data received from application above 06 create TCP segment with sequence number nextseqnum 07 If (timer is currently not running) start timer for segment
nextseqnum 08 pass segment to IP 09 nextseqnum = nextseqnum + length(data) 10 event timer timeout 11 retransmit not-yet-ACKed segment with smallest Seq 12 Start timer13 event ACK received with ACK field value of y 15 if (y gt sendbase) cumulative ACK of all data up to y 16 sendbase = y 17 If (there are currently any not-yet-ACKed segments)18 start timer19 20 end of loop forever
Associated with the oldest unACKed
segment
SIMPLIFIED TCPSIMPLIFIED TCP SENDER
AssumptionsAssumptions bull sender is not constrained by TCP flow or congestion controlbull that data from above is less than MSS in sizebull that data transfer is in one direction only
Transport Layer ndash TCP
20
BB
00 sendbase = initial_sequence number 01 nextseqnum = initial_sequence number 0203 loop (forever) 04 switch(event) 05 event data received from application above 06 create TCP segment with sequence number nextseqnum 07 start timer for segment nextseqnum 08 pass segment to IP 09 nextseqnum = nextseqnum + length(data) 10 event timer timeout for segment with sequence number y 11 retransmit segment with sequence number y 12 compute new timeout interval for segment y 13 restart timer for sequence number y 14 event ACK received with ACK field value of y 15 if (y gt sendbase) cumulative ACK of all data up to y 16 cancel all timers for segments with sequence numbers lt y 17 sendbase = y 18 19 else a duplicate ACK for already ACKed segment 20 increment number of duplicate ACKs received for y 21 if (number of duplicate ACKS received for y is 3) 22 perform TCP fast retransmission 23 resend segment with sequence number y 24 restart timer for segment y 25 26 end of loop forever
TCPTCP with MODIFICATIONSwith MODIFICATIONS SENDER
Why wait for the timeout to expire when consecutive
ACKs can be used to indicate a lost segment
With Fast Retransmit
Transport Layer ndash TCP
21
BB
TCP ACK generation [RFC 1122 RFC 2581]
Event
in-order segment arrival no gapseverything else already ACKed
in-order segment arrival no gaps one delayed ACK pending (due to action 1)
out-of-order segment arrivalwith higher than expect seq - a gap is detected
arrival of segment that partially or completely fills gap
TCP Receiver action
Delay sending the ACK Wait up to 500msfor next segment If next segment does not arrive in this interval send ACK
immediately send a singlecumulative ACK
send duplicate ACK indicating seq of next expected byte
Immediately send an ACK if segment startsat lower end of gap
1
2
3
4
Receiver does not discard out-of-order segmentsReceiver does not discard out-of-order segments
Transport Layer ndash TCP
22
BB
TCP Interesting Scenarios
Host A
Seq=92 8 bytes data
ACK=100
loss
tim
eout
time lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
Host A
Seq=100 20 bytes data
ACK=100
Seq=
92
tim
eout
time premature timeoutcumulative ACKs
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
ACK=120
Retransmission due to lost ACKSegment with Seq=100 not retransmitted
Timer is restarted here for Seq=92
Simplified TCP versionSimplified TCP version
Transport Layer ndash TCP
23
BB
Host A
Seq=100 20 bytes data
ACK=100Seq=
92
tim
eout
time
Host B
ACK=120
Seq=92 8 bytes data
Xloss
Cumulative ACK avoids retransmission of the first segment
TCP Retransmission ScenarioTCP Retransmission Scenario
Transport Layer ndash TCP
24
BB
TCP Modifications Doubling the Timeout IntervalDoubling the Timeout Interval
Provides a limited form of congestion control
Timer expiration is more likely caused by congestion in the network
TimeoutInterval = 2 TimeoutIntervalPrevious
TCP acts more politely by increasing the TimeoutInterval causing the sender to retransmit after longer and longer intervals
Congestion may get worse if sources
continue to retransmit packets
persistently
After ACK is received After ACK is received TimeoutInterval is derived TimeoutInterval is derived from most recent EstimatedRTT from most recent EstimatedRTT and DevRTTand DevRTT
Others check RFC 2018 ndash selective ACK
Transport Layer ndash TCP
25
BB
receiverreceiver explicitly informs sender of (dynamically changing) amount of free buffer space RcvWindowRcvWindow field
in TCP segment
sendersender keeps the amount of transmitted unACKed data less thanless than most recently received RcvWindowRcvWindow
sender wont overrun
receivers buffer bytransmitting too
much too fast
flow controlflow control
receiver buffering
RcvBufferRcvBuffer = size of TCP Receive Buffer
RcvWindowRcvWindow = amount of spare room in Buffer
TCP Flow ControlTCP Flow Control
Transport Layer ndash TCP
26
BB
FLOW CONTROL FLOW CONTROL ReceiverReceiver
04060 50100
LastByteRcvd
EXAMPLE HOST A sends a large file to HOST B
LastByteRead
Application Process
Data from IP
RcvBuffer
RcvWindow=RcvBuffer-[LastByteRcvd-LastByteRead]
RECEIVER HOST B ndash uses RcvWindow LastByteRcvd LastByteReadRcvWindow LastByteRcvd LastByteRead
Initially RcvWindow = RcvBuffer Application reads from the bufferHOST BHOST B tells HOST AHOST A how much spare roomhow much spare room it has in the connection buffer by placing its current value of RcvWindowRcvWindow in the receive window field of every segment it sends to HOST AHOST A
Transport Layer ndash TCP
27
BB
FLOW CONTROL SenderSender
04060 50100
LastByteSent
EXAMPLE HOST A sends a large file to HOST B
LastByteACKed
Data
To ensure that HOST B does not overflow HOST A maintains throughout the connectionrsquos life that [LastByteSent-LastByteACKed] lt= RcvWindow
SENDER HOST A
ACKs ACKs from from Host Host BB
SENDER HOST A ndash uses RcvWindow RcvWindow of HostBof HostB LastByteSent LastByteACKed LastByteSent LastByteACKed
Transport Layer ndash TCP
28
BB
FLOW CONTROLFLOW CONTROLSome issue to consider
RcvWindowRcvWindow ndash used by the connection to provide the flow control serviceflow control service
TCPTCP requires that requires that HOST AHOST A continue to sendcontinue to send segments with one segments with one data byte when HOST Brsquos receive data byte when HOST Brsquos receive window is window is 00 Such segments will be Such segments will be ACKedACKed by HOST B by HOST B Eventually the buffer will have Eventually the buffer will have some space and the some space and the ACKsACKs will will contain contain RcvWindow gt 0RcvWindow gt 0
What happens when the receive
buffer of HOST B is full full (that is when (that is when
RcvWindow=0)RcvWindow=0)
TCP sends a segment only when there is datadata or ACKACK to send Therefore the sender must maintain the connection lsquoalivealiversquo
Transport Layer ndash TCP
29
BB
TCP Connection ManagementTCP Connection Management
Recall TCP sender receiver establish ldquoconnectionconnectionrdquo before exchanging data segments
Initialize TCP variables sequence numbers buffers flow control info (eg RcvWindow)
ClientClient is the connection initiator
In Java Socket clientSocket = new Socket(hostnameport number) connect ServerServer is contacted by client
In JavaSocket accept()
if (connectconnect(s (struct sockaddr )ampsin sizeof(sin)) = 0) printf(connect failedn) WSACleanup() exit(1)
ns = acceptaccept(s(struct sockaddr )(ampremoteaddr)ampaddrlen)
Transport Layer ndash TCP
30
BB
TCP Connection Management
Three way handshakeStep 1 clientclient end system sends TCP
SYNSYN control segment to server (executed by TCP itself)
specifies initial seq number (isn)
Step 2 serverserver end system receives SYNSYN replies with SYNACKSYNACK control segment
ACKs received SYN allocates buffers specifies serverrsquos initial seq
number
Step 3 clientclient ACKsACKs the connection with
ACK=server_isn +1ACK=server_isn +1
allocates buffers sends SYN=0SYN=0
Connection established
Client
AcceptAccept (SYN=1
seq=server_isnack=client_isn+1)
time
Server
ConnectConnect (SYN=1 seq=client_isn)
ACK (SYN=0
seq=client_isn+1ack=server_isn+1)
Establishing a connection
This is what happens when we create a socket for
connection to a server
After establishing the connection the client can receive segments with app-generated data (SYN=0)(SYN=0)
Transport Layer ndash TCP
31
BB
TCP Connection Management (cont)
Closing a connection
client closes socket
closesocket(s)closesocket(s)
Java clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
How TCP connection is established and torn down
Transport Layer ndash TCP
32
BB
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
Enters timed wait - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer ndash TCP
33
BB
TCP Connection Management (cont)
TCP client lifecycle
TCP server lifecycle
Used in case ACK gets lost It is implementation-dependent (eg 30
seconds 1 minute 2 minutes
Connection formally closes ndash all resources (eg port numbers) are
released
1
2
3
4
5
6
7
8
9
10
11
12
Transport Layer ndash TCP
34
BB
End of Flow Control and Error Control
Transport Layer ndash TCP
35
BB
Flow Control vs Congestion ControlFlow Control vs Congestion ControlSimilar actions are taken but for very different reasons
Flow ControlFlow Controlbull point-to-point traffic between sender and receiverbull speed matching service matching the rate at which the sender is sending against the rate at which the receiving application is readingbull prevents Receiver Buffer from overflowing
CongestionCongestion ndash happens when there are too many sources attempting to send data at too high a rate for the routers along the path
Congestion ControlCongestion Controlbull service that makes sure that the routers between End Systems are able to carry the offered trafficbull prevents routers from overflowing
Same course of action Throttling of the sender
Transport Layer ndash TCP
36
BB
CongestionCongestion Informally too many sources sending too
much data too fast for network to handle different from flow control Manifestations
lost packets (buffer overflow at routers) long delays (queuing in router buffers)
a top-10 problem
Principles of Congestion ControlPrinciples of Congestion Control
Transport Layer ndash TCP
37
BB
Approaches towards congestion control
End-to-end congestion control
no explicit feedback from network
congestion inferred by end-systems from observed packet loss amp delay
approach taken by TCP
Network-assisted congestion control
routers provide feedback to End Systems in the form of single bit indicating
link congestion (SNA DECbit TCPIP ECN ATM ABR)
explicit transmission rate the sender should send at
1 2
Two broad approaches towards congestion control
Transport Layer ndash TCP
38
BB
TCP Congestion ControlTCP Congestion ControlHow TCP sender limits the rate at which it sends traffic into its connection
SENDER
(Amount of unACKed data)SENDER lt min(CongWin RcvWindow)
LastByteSent - LastByteACKed
Indirectly limits the senderrsquos send rateAssumptions
bull TCP receive buffer is very large ndash no RcvWindow constraint Amt of unACKed data at sender is solely limited by CongWin
bull Packet loss delay amp packet transmission delay are negligible
Sending rate (approx) CongWinRTT
By adjusting CongWin sender can therefore adjust the
rate at which it sends data into its connection
New variable ndash Congestion
Window
Transport Layer ndash TCP
39
BB
TCP Congestion ControlTCP Congestion ControlTCP uses ACKs to trigger (ldquoclockrdquo) its increase in congestion window size ndash ldquoself-clockingrdquo
Arrival of ACKs ndash indication to the sender that all is well
1 Slow Rate
bull Congestion window will be increased at a relatively slow rate
2 High rate
bull Congestion window will be increased more quickly
Transport Layer ndash TCP
40
BB
TCP Congestion ControlTCP Congestion ControlHow TCP perceives that there is congestion on the path
ldquoLoss Eventrdquo ndash when there is excessive congestion router buffers along the path overflows causing datagrams to be dropped which in turn results in a ldquoloss eventrdquo at the sender
1 Timeout
bull no ACK is received after segment loss
2 Receipt of three duplicate ACKs
bull segment loss is followed by three ACKs received at the sender
Transport Layer ndash TCP
41
BB
TCP Congestion Control details
sender limits transmission LastByteSent-LastByteAcked
cwnd
roughly
cwnd is dynamic function of perceived network congestion
How does sender perceive congestion
loss event = timeout or 3 duplicate acks
TCP sender reduces rate (cwnd) after loss event
Three mechanisms1 AIMD
2 slow start
3 conservative after timeout events
rate = cwnd
RTT Bytessec
Transport Layer ndash TCP
42
BB
TCP congestion avoidance congestion avoidance additive increase multiplicative decreaseadditive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every RTT until
loss is detected multiplicative decrease cut cwnd in half after loss
timecwnd
co
nge
stio
n w
indo
w s
ize
saw toothbehavior probingfor bandwidth
Transport Layer ndash TCP
43
BB
TCP Slow Start Slow Start when connection begins
increase rate exponentially until first loss event initially cwndcwnd = 1 MSS double cwndcwnd every RTT done by incrementing cwndcwnd by 1 MSS by 1 MSS for every ACK received
summary initial rate is slow but ramps up exponentially fast (doubling of the sending rate every RTT)
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transport Layer ndash TCP
44
BB
Refinement inferring lossRefinement inferring loss
after 3 dup ACKsafter 3 dup ACKs cwndcwnd is cut in halfhalf window then grows
linearlylinearly butbut after timeout after timeout
eventevent cwndcwnd is is set to 1 MSS1 MSS window then grows
exponentiallyexponentially Up to a thresholdUp to a threshold
then grows linearlylinearly
3 dup ACKs 3 dup ACKs indicates network capable of delivering some segments
timeouttimeout indicates a ldquomore alarmingrdquo congestion scenario
Philosophy
Transport Layer ndash TCP
45
BB
Refinement
Q when should the exponential increase switch to linear
A when cwndcwnd gets to 1212 of its value before timeout
Implementation variable ssthresh (slow-start threshold)ssthresh (slow-start threshold) on loss eventon loss event ssthreshssthresh is set to 1212 of cwndcwnd just
before loss event
Transport Layer ndash TCP
46
BB
TCP Sender Congestion ControlSTATE EVENT TCP SENDER Congestion-
Control ActionCommentary
SLOW START (SS)
ACK receipt for previously unACKed data
CongWin = CongWin + MSSIf(CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo
Resulting in a doubling of CongWin every RTT
Congestion Avoidance (CA)
ACK receipt for previously unACKed data
CongWin = CongWin + MSS (MSSCongWin)
Additive increase resulting in increasing of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate ACK
Threshold = CongWin 2CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
SS or CA Timeout Threshold = CongWin 2CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter Slow Start
SS or CA Duplicate ACK Increment duplicate ACK count for segment being ACKed
CongWin and Threshold not changed
Transport Layer ndash TCP
47
BB
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3 MSSretransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++
duplicate ACK
cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
Transport Layer ndash TCP
48
BB
Congestion control
TCPrsquos Congestion Control Service
CLIENTSERVER
Problem Gridlock sets-in when there is packet loss due to router congestion
forces the End Systems to decrease the rate at which packets are sentdecrease the rate at which packets are sent during periods of congestion
The sending systemrsquos packet is lost due to congestion and is alerted when it stops receiving ACKs of packets sent
Transport Layer ndash TCP
49
BBTransport Layer 3-49
Macroscopic Description of TCP throughput
whatrsquos the average throughout of TCP as a function of window size and RTT ignore slow start (typically very short phases)
let W be the window size when loss occurs when window is W throughput is WRTT just after loss window drops to W2 throughput to
W2RTT Throughput increases linearly (by MSSRTT every
RTT) Average Throughput 75 WRTT
(Based on Idealised model for the steady-state dynamics of TCP)
Transport Layer 3-50
TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired
throughput of 10 Gbps requires window size W = 83333 in-flight
segments Throughput in terms of loss rate
L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)
new versions of TCP is needed for high-speed environments
LRTT
MSS221
Transport Layer ndash TCP
51
BB
TCP FairnessTCP Fairness
Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth
TCP connection 1
bottleneckrouter
capacity RR
TCP connection 2
Go to Summary of TCP Congestion Control
Transport Layer ndash TCP
52
BB
Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link
Link with transmission rate of RR
Each connection have the same MSSMSS RTTRTT
No other TCP connections or UDP datagrams traverse the shared link
Ignore slow start phase of TCP
Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)
Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing
AssumptionsAssumptions
Transport Layer ndash TCP
53
BB
Why is TCP fairWhy is TCP fair
Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput
proportionally
RR
equal bandwidth shareequal bandwidth share
Connection 1 throughput
Connect
ion 2
thro
ughput
congestion avoidance additive increaseloss decrease window by factor of 2
RR
A point on the graph depicts the amount of link bandwidth
jointly jointly consumedconsumed by
the connections
Full bandwidth utilisation line
We can viewWe can view a simulation a simulation
on thison this
View SimulationView Simulation
Transport Layer ndash TCP
54
BB
The End
The next succeeding slides are just for additional reading
Transport Layer ndash TCP
55
BB
TCP Latency ModelingTCP Latency Modeling
In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs
Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth
Multiple End Systems sharing a link
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
Loop holes in TCPLoop holes in TCP
1 TCP connection
1 TCP connection
1 TCP connection
3 TCP connections
Multithreading implementationMultithreading implementation
Transport Layer ndash TCP
56
BB
TCP latency modelingTCP latency modeling
TCP connection establishment time data transfer delay Actual data transmission time
Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent
QQ How long How long does it take to does it take to
receive an object receive an object from a Web from a Web
serverserver
No data transfer delayNo data transfer delay
Therersquos data transfer delayTherersquos data transfer delay
the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety
Transport Layer ndash TCP
57
BB
TCP Latency ModelingTCP Latency Modeling
Network is uncongested with one link between end systems of rate RR
CongWinCongWin (fixed) determines the amount of data that can be sent
No packet loss no packet corruption no retransmissions required
Header overheads are negligible
File to sendFile to send = integer number of segments of size MSS
Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times
CLIENT SERVER
FILEFILE
Initial ThresholdThreshold of TCP congestion mechanism is very big
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits
SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)
Transport Layer ndash TCP
58
BB
TCP latency ModelingTCP latency Modeling
Case 1 latency = 2RTT + OR
K = Number of Windows of data that cover the object
K = OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
Number of segmentsRounded up to the nearest integer
Assume W=4 segmentsAssume W=4 segments
eg O=256bits S=32bits W=4
Transport Layer ndash TCP
59
BB
TCP latency ModelingTCP latency Modeling
Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]
Number of Windows of data that cover the objectK= OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR
Sender has to wait for an ACK after a windowrsquos worth of data sent
STALLED STALLED PERIODPERIOD
If there are k windows sender willbe stalled (k-1) times
Transport Layer ndash TCP
60
BB
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
STALLED STALLED PERIODPERIOD
4 windows4 windows
OS=15
Transport Layer ndash TCP
61
BB
bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the
object as followsobject as follows
S
OkK k 110 222min
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1log
1logmin
12min
2
2
S
OK
S
OkkK
S
OkK k
Note
Transport Layer ndash TCP
62
BB
bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window
bull Transmission of Transmission of kkth window window ==
bull Stall Time Stall Time ==
bull Latency Latency ==
12
k
R
S
12
k
R
SRTT
R
S
RTTR
S
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1K
1
122k
k
R
SRTT
R
S
R
ORTT
Transport Layer ndash TCP
63
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
Transport Layer ndash TCP
64
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
R
S
R
SRTTP
R
ORTT P )12(2Latency
bull Closed-form expression for the latency
Transport Layer ndash TCP
65
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
2
1
RTTRO
P
encyMinimumLat
Latency
Slow start will not significantly increase latency if RTT ltlt OR
Transport Layer ndash TCP
66
BB
httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features
Transport Layer ndash TCP
18
BB
00 sendbase = initial_sequence number 01 nextseqnum = initial_sequence number 0203 loop (forever) 04 switch(event) 05 event data received from application above 06 create TCP segment with sequence number nextseqnum 07 If (timer is currently not running) start timer for segment
nextseqnum 08 pass segment to IP 09 nextseqnum = nextseqnum + length(data) 10 event timer timeout 11 retransmit not-yet-ACKed segment with smallest Seq 12 Start timer13 event ACK received with ACK field value of y 15 if (y gt sendbase) cumulative ACK of all data up to y 16 sendbase = y 17 If (there are currently any not-yet-ACKed segments)18 start timer19 20 end of loop forever
Associated with the oldest unACKed
segment
SIMPLIFIED TCPSIMPLIFIED TCP SENDER
AssumptionsAssumptions bull sender is not constrained by TCP flow or congestion controlbull that data from above is less than MSS in sizebull that data transfer is in one direction only
Transport Layer ndash TCP
20
BB
00 sendbase = initial_sequence number 01 nextseqnum = initial_sequence number 0203 loop (forever) 04 switch(event) 05 event data received from application above 06 create TCP segment with sequence number nextseqnum 07 start timer for segment nextseqnum 08 pass segment to IP 09 nextseqnum = nextseqnum + length(data) 10 event timer timeout for segment with sequence number y 11 retransmit segment with sequence number y 12 compute new timeout interval for segment y 13 restart timer for sequence number y 14 event ACK received with ACK field value of y 15 if (y gt sendbase) cumulative ACK of all data up to y 16 cancel all timers for segments with sequence numbers lt y 17 sendbase = y 18 19 else a duplicate ACK for already ACKed segment 20 increment number of duplicate ACKs received for y 21 if (number of duplicate ACKS received for y is 3) 22 perform TCP fast retransmission 23 resend segment with sequence number y 24 restart timer for segment y 25 26 end of loop forever
TCPTCP with MODIFICATIONSwith MODIFICATIONS SENDER
Why wait for the timeout to expire when consecutive
ACKs can be used to indicate a lost segment
With Fast Retransmit
Transport Layer ndash TCP
21
BB
TCP ACK generation [RFC 1122 RFC 2581]
Event
in-order segment arrival no gapseverything else already ACKed
in-order segment arrival no gaps one delayed ACK pending (due to action 1)
out-of-order segment arrivalwith higher than expect seq - a gap is detected
arrival of segment that partially or completely fills gap
TCP Receiver action
Delay sending the ACK Wait up to 500msfor next segment If next segment does not arrive in this interval send ACK
immediately send a singlecumulative ACK
send duplicate ACK indicating seq of next expected byte
Immediately send an ACK if segment startsat lower end of gap
1
2
3
4
Receiver does not discard out-of-order segmentsReceiver does not discard out-of-order segments
Transport Layer ndash TCP
22
BB
TCP Interesting Scenarios
Host A
Seq=92 8 bytes data
ACK=100
loss
tim
eout
time lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
Host A
Seq=100 20 bytes data
ACK=100
Seq=
92
tim
eout
time premature timeoutcumulative ACKs
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
ACK=120
Retransmission due to lost ACKSegment with Seq=100 not retransmitted
Timer is restarted here for Seq=92
Simplified TCP versionSimplified TCP version
Transport Layer ndash TCP
23
BB
Host A
Seq=100 20 bytes data
ACK=100Seq=
92
tim
eout
time
Host B
ACK=120
Seq=92 8 bytes data
Xloss
Cumulative ACK avoids retransmission of the first segment
TCP Retransmission ScenarioTCP Retransmission Scenario
Transport Layer ndash TCP
24
BB
TCP Modifications Doubling the Timeout IntervalDoubling the Timeout Interval
Provides a limited form of congestion control
Timer expiration is more likely caused by congestion in the network
TimeoutInterval = 2 TimeoutIntervalPrevious
TCP acts more politely by increasing the TimeoutInterval causing the sender to retransmit after longer and longer intervals
Congestion may get worse if sources
continue to retransmit packets
persistently
After ACK is received After ACK is received TimeoutInterval is derived TimeoutInterval is derived from most recent EstimatedRTT from most recent EstimatedRTT and DevRTTand DevRTT
Others check RFC 2018 ndash selective ACK
Transport Layer ndash TCP
25
BB
receiverreceiver explicitly informs sender of (dynamically changing) amount of free buffer space RcvWindowRcvWindow field
in TCP segment
sendersender keeps the amount of transmitted unACKed data less thanless than most recently received RcvWindowRcvWindow
sender wont overrun
receivers buffer bytransmitting too
much too fast
flow controlflow control
receiver buffering
RcvBufferRcvBuffer = size of TCP Receive Buffer
RcvWindowRcvWindow = amount of spare room in Buffer
TCP Flow ControlTCP Flow Control
Transport Layer ndash TCP
26
BB
FLOW CONTROL FLOW CONTROL ReceiverReceiver
04060 50100
LastByteRcvd
EXAMPLE HOST A sends a large file to HOST B
LastByteRead
Application Process
Data from IP
RcvBuffer
RcvWindow=RcvBuffer-[LastByteRcvd-LastByteRead]
RECEIVER HOST B ndash uses RcvWindow LastByteRcvd LastByteReadRcvWindow LastByteRcvd LastByteRead
Initially RcvWindow = RcvBuffer Application reads from the bufferHOST BHOST B tells HOST AHOST A how much spare roomhow much spare room it has in the connection buffer by placing its current value of RcvWindowRcvWindow in the receive window field of every segment it sends to HOST AHOST A
Transport Layer ndash TCP
27
BB
FLOW CONTROL SenderSender
04060 50100
LastByteSent
EXAMPLE HOST A sends a large file to HOST B
LastByteACKed
Data
To ensure that HOST B does not overflow HOST A maintains throughout the connectionrsquos life that [LastByteSent-LastByteACKed] lt= RcvWindow
SENDER HOST A
ACKs ACKs from from Host Host BB
SENDER HOST A ndash uses RcvWindow RcvWindow of HostBof HostB LastByteSent LastByteACKed LastByteSent LastByteACKed
Transport Layer ndash TCP
28
BB
FLOW CONTROLFLOW CONTROLSome issue to consider
RcvWindowRcvWindow ndash used by the connection to provide the flow control serviceflow control service
TCPTCP requires that requires that HOST AHOST A continue to sendcontinue to send segments with one segments with one data byte when HOST Brsquos receive data byte when HOST Brsquos receive window is window is 00 Such segments will be Such segments will be ACKedACKed by HOST B by HOST B Eventually the buffer will have Eventually the buffer will have some space and the some space and the ACKsACKs will will contain contain RcvWindow gt 0RcvWindow gt 0
What happens when the receive
buffer of HOST B is full full (that is when (that is when
RcvWindow=0)RcvWindow=0)
TCP sends a segment only when there is datadata or ACKACK to send Therefore the sender must maintain the connection lsquoalivealiversquo
Transport Layer ndash TCP
29
BB
TCP Connection ManagementTCP Connection Management
Recall TCP sender receiver establish ldquoconnectionconnectionrdquo before exchanging data segments
Initialize TCP variables sequence numbers buffers flow control info (eg RcvWindow)
ClientClient is the connection initiator
In Java Socket clientSocket = new Socket(hostnameport number) connect ServerServer is contacted by client
In JavaSocket accept()
if (connectconnect(s (struct sockaddr )ampsin sizeof(sin)) = 0) printf(connect failedn) WSACleanup() exit(1)
ns = acceptaccept(s(struct sockaddr )(ampremoteaddr)ampaddrlen)
Transport Layer ndash TCP
30
BB
TCP Connection Management
Three way handshakeStep 1 clientclient end system sends TCP
SYNSYN control segment to server (executed by TCP itself)
specifies initial seq number (isn)
Step 2 serverserver end system receives SYNSYN replies with SYNACKSYNACK control segment
ACKs received SYN allocates buffers specifies serverrsquos initial seq
number
Step 3 clientclient ACKsACKs the connection with
ACK=server_isn +1ACK=server_isn +1
allocates buffers sends SYN=0SYN=0
Connection established
Client
AcceptAccept (SYN=1
seq=server_isnack=client_isn+1)
time
Server
ConnectConnect (SYN=1 seq=client_isn)
ACK (SYN=0
seq=client_isn+1ack=server_isn+1)
Establishing a connection
This is what happens when we create a socket for
connection to a server
After establishing the connection the client can receive segments with app-generated data (SYN=0)(SYN=0)
Transport Layer ndash TCP
31
BB
TCP Connection Management (cont)
Closing a connection
client closes socket
closesocket(s)closesocket(s)
Java clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
How TCP connection is established and torn down
Transport Layer ndash TCP
32
BB
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
Enters timed wait - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer ndash TCP
33
BB
TCP Connection Management (cont)
TCP client lifecycle
TCP server lifecycle
Used in case ACK gets lost It is implementation-dependent (eg 30
seconds 1 minute 2 minutes
Connection formally closes ndash all resources (eg port numbers) are
released
1
2
3
4
5
6
7
8
9
10
11
12
Transport Layer ndash TCP
34
BB
End of Flow Control and Error Control
Transport Layer ndash TCP
35
BB
Flow Control vs Congestion ControlFlow Control vs Congestion ControlSimilar actions are taken but for very different reasons
Flow ControlFlow Controlbull point-to-point traffic between sender and receiverbull speed matching service matching the rate at which the sender is sending against the rate at which the receiving application is readingbull prevents Receiver Buffer from overflowing
CongestionCongestion ndash happens when there are too many sources attempting to send data at too high a rate for the routers along the path
Congestion ControlCongestion Controlbull service that makes sure that the routers between End Systems are able to carry the offered trafficbull prevents routers from overflowing
Same course of action Throttling of the sender
Transport Layer ndash TCP
36
BB
CongestionCongestion Informally too many sources sending too
much data too fast for network to handle different from flow control Manifestations
lost packets (buffer overflow at routers) long delays (queuing in router buffers)
a top-10 problem
Principles of Congestion ControlPrinciples of Congestion Control
Transport Layer ndash TCP
37
BB
Approaches towards congestion control
End-to-end congestion control
no explicit feedback from network
congestion inferred by end-systems from observed packet loss amp delay
approach taken by TCP
Network-assisted congestion control
routers provide feedback to End Systems in the form of single bit indicating
link congestion (SNA DECbit TCPIP ECN ATM ABR)
explicit transmission rate the sender should send at
1 2
Two broad approaches towards congestion control
Transport Layer ndash TCP
38
BB
TCP Congestion ControlTCP Congestion ControlHow TCP sender limits the rate at which it sends traffic into its connection
SENDER
(Amount of unACKed data)SENDER lt min(CongWin RcvWindow)
LastByteSent - LastByteACKed
Indirectly limits the senderrsquos send rateAssumptions
bull TCP receive buffer is very large ndash no RcvWindow constraint Amt of unACKed data at sender is solely limited by CongWin
bull Packet loss delay amp packet transmission delay are negligible
Sending rate (approx) CongWinRTT
By adjusting CongWin sender can therefore adjust the
rate at which it sends data into its connection
New variable ndash Congestion
Window
Transport Layer ndash TCP
39
BB
TCP Congestion ControlTCP Congestion ControlTCP uses ACKs to trigger (ldquoclockrdquo) its increase in congestion window size ndash ldquoself-clockingrdquo
Arrival of ACKs ndash indication to the sender that all is well
1 Slow Rate
bull Congestion window will be increased at a relatively slow rate
2 High rate
bull Congestion window will be increased more quickly
Transport Layer ndash TCP
40
BB
TCP Congestion ControlTCP Congestion ControlHow TCP perceives that there is congestion on the path
ldquoLoss Eventrdquo ndash when there is excessive congestion router buffers along the path overflows causing datagrams to be dropped which in turn results in a ldquoloss eventrdquo at the sender
1 Timeout
bull no ACK is received after segment loss
2 Receipt of three duplicate ACKs
bull segment loss is followed by three ACKs received at the sender
Transport Layer ndash TCP
41
BB
TCP Congestion Control details
sender limits transmission LastByteSent-LastByteAcked
cwnd
roughly
cwnd is dynamic function of perceived network congestion
How does sender perceive congestion
loss event = timeout or 3 duplicate acks
TCP sender reduces rate (cwnd) after loss event
Three mechanisms1 AIMD
2 slow start
3 conservative after timeout events
rate = cwnd
RTT Bytessec
Transport Layer ndash TCP
42
BB
TCP congestion avoidance congestion avoidance additive increase multiplicative decreaseadditive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every RTT until
loss is detected multiplicative decrease cut cwnd in half after loss
timecwnd
co
nge
stio
n w
indo
w s
ize
saw toothbehavior probingfor bandwidth
Transport Layer ndash TCP
43
BB
TCP Slow Start Slow Start when connection begins
increase rate exponentially until first loss event initially cwndcwnd = 1 MSS double cwndcwnd every RTT done by incrementing cwndcwnd by 1 MSS by 1 MSS for every ACK received
summary initial rate is slow but ramps up exponentially fast (doubling of the sending rate every RTT)
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transport Layer ndash TCP
44
BB
Refinement inferring lossRefinement inferring loss
after 3 dup ACKsafter 3 dup ACKs cwndcwnd is cut in halfhalf window then grows
linearlylinearly butbut after timeout after timeout
eventevent cwndcwnd is is set to 1 MSS1 MSS window then grows
exponentiallyexponentially Up to a thresholdUp to a threshold
then grows linearlylinearly
3 dup ACKs 3 dup ACKs indicates network capable of delivering some segments
timeouttimeout indicates a ldquomore alarmingrdquo congestion scenario
Philosophy
Transport Layer ndash TCP
45
BB
Refinement
Q when should the exponential increase switch to linear
A when cwndcwnd gets to 1212 of its value before timeout
Implementation variable ssthresh (slow-start threshold)ssthresh (slow-start threshold) on loss eventon loss event ssthreshssthresh is set to 1212 of cwndcwnd just
before loss event
Transport Layer ndash TCP
46
BB
TCP Sender Congestion ControlSTATE EVENT TCP SENDER Congestion-
Control ActionCommentary
SLOW START (SS)
ACK receipt for previously unACKed data
CongWin = CongWin + MSSIf(CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo
Resulting in a doubling of CongWin every RTT
Congestion Avoidance (CA)
ACK receipt for previously unACKed data
CongWin = CongWin + MSS (MSSCongWin)
Additive increase resulting in increasing of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate ACK
Threshold = CongWin 2CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
SS or CA Timeout Threshold = CongWin 2CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter Slow Start
SS or CA Duplicate ACK Increment duplicate ACK count for segment being ACKed
CongWin and Threshold not changed
Transport Layer ndash TCP
47
BB
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3 MSSretransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++
duplicate ACK
cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
Transport Layer ndash TCP
48
BB
Congestion control
TCPrsquos Congestion Control Service
CLIENTSERVER
Problem Gridlock sets-in when there is packet loss due to router congestion
forces the End Systems to decrease the rate at which packets are sentdecrease the rate at which packets are sent during periods of congestion
The sending systemrsquos packet is lost due to congestion and is alerted when it stops receiving ACKs of packets sent
Transport Layer ndash TCP
49
BBTransport Layer 3-49
Macroscopic Description of TCP throughput
whatrsquos the average throughout of TCP as a function of window size and RTT ignore slow start (typically very short phases)
let W be the window size when loss occurs when window is W throughput is WRTT just after loss window drops to W2 throughput to
W2RTT Throughput increases linearly (by MSSRTT every
RTT) Average Throughput 75 WRTT
(Based on Idealised model for the steady-state dynamics of TCP)
Transport Layer 3-50
TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired
throughput of 10 Gbps requires window size W = 83333 in-flight
segments Throughput in terms of loss rate
L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)
new versions of TCP is needed for high-speed environments
LRTT
MSS221
Transport Layer ndash TCP
51
BB
TCP FairnessTCP Fairness
Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth
TCP connection 1
bottleneckrouter
capacity RR
TCP connection 2
Go to Summary of TCP Congestion Control
Transport Layer ndash TCP
52
BB
Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link
Link with transmission rate of RR
Each connection have the same MSSMSS RTTRTT
No other TCP connections or UDP datagrams traverse the shared link
Ignore slow start phase of TCP
Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)
Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing
AssumptionsAssumptions
Transport Layer ndash TCP
53
BB
Why is TCP fairWhy is TCP fair
Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput
proportionally
RR
equal bandwidth shareequal bandwidth share
Connection 1 throughput
Connect
ion 2
thro
ughput
congestion avoidance additive increaseloss decrease window by factor of 2
RR
A point on the graph depicts the amount of link bandwidth
jointly jointly consumedconsumed by
the connections
Full bandwidth utilisation line
We can viewWe can view a simulation a simulation
on thison this
View SimulationView Simulation
Transport Layer ndash TCP
54
BB
The End
The next succeeding slides are just for additional reading
Transport Layer ndash TCP
55
BB
TCP Latency ModelingTCP Latency Modeling
In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs
Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth
Multiple End Systems sharing a link
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
Loop holes in TCPLoop holes in TCP
1 TCP connection
1 TCP connection
1 TCP connection
3 TCP connections
Multithreading implementationMultithreading implementation
Transport Layer ndash TCP
56
BB
TCP latency modelingTCP latency modeling
TCP connection establishment time data transfer delay Actual data transmission time
Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent
QQ How long How long does it take to does it take to
receive an object receive an object from a Web from a Web
serverserver
No data transfer delayNo data transfer delay
Therersquos data transfer delayTherersquos data transfer delay
the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety
Transport Layer ndash TCP
57
BB
TCP Latency ModelingTCP Latency Modeling
Network is uncongested with one link between end systems of rate RR
CongWinCongWin (fixed) determines the amount of data that can be sent
No packet loss no packet corruption no retransmissions required
Header overheads are negligible
File to sendFile to send = integer number of segments of size MSS
Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times
CLIENT SERVER
FILEFILE
Initial ThresholdThreshold of TCP congestion mechanism is very big
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits
SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)
Transport Layer ndash TCP
58
BB
TCP latency ModelingTCP latency Modeling
Case 1 latency = 2RTT + OR
K = Number of Windows of data that cover the object
K = OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
Number of segmentsRounded up to the nearest integer
Assume W=4 segmentsAssume W=4 segments
eg O=256bits S=32bits W=4
Transport Layer ndash TCP
59
BB
TCP latency ModelingTCP latency Modeling
Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]
Number of Windows of data that cover the objectK= OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR
Sender has to wait for an ACK after a windowrsquos worth of data sent
STALLED STALLED PERIODPERIOD
If there are k windows sender willbe stalled (k-1) times
Transport Layer ndash TCP
60
BB
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
STALLED STALLED PERIODPERIOD
4 windows4 windows
OS=15
Transport Layer ndash TCP
61
BB
bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the
object as followsobject as follows
S
OkK k 110 222min
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1log
1logmin
12min
2
2
S
OK
S
OkkK
S
OkK k
Note
Transport Layer ndash TCP
62
BB
bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window
bull Transmission of Transmission of kkth window window ==
bull Stall Time Stall Time ==
bull Latency Latency ==
12
k
R
S
12
k
R
SRTT
R
S
RTTR
S
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1K
1
122k
k
R
SRTT
R
S
R
ORTT
Transport Layer ndash TCP
63
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
Transport Layer ndash TCP
64
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
R
S
R
SRTTP
R
ORTT P )12(2Latency
bull Closed-form expression for the latency
Transport Layer ndash TCP
65
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
2
1
RTTRO
P
encyMinimumLat
Latency
Slow start will not significantly increase latency if RTT ltlt OR
Transport Layer ndash TCP
66
BB
httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features
Transport Layer ndash TCP
20
BB
00 sendbase = initial_sequence number 01 nextseqnum = initial_sequence number 0203 loop (forever) 04 switch(event) 05 event data received from application above 06 create TCP segment with sequence number nextseqnum 07 start timer for segment nextseqnum 08 pass segment to IP 09 nextseqnum = nextseqnum + length(data) 10 event timer timeout for segment with sequence number y 11 retransmit segment with sequence number y 12 compute new timeout interval for segment y 13 restart timer for sequence number y 14 event ACK received with ACK field value of y 15 if (y gt sendbase) cumulative ACK of all data up to y 16 cancel all timers for segments with sequence numbers lt y 17 sendbase = y 18 19 else a duplicate ACK for already ACKed segment 20 increment number of duplicate ACKs received for y 21 if (number of duplicate ACKS received for y is 3) 22 perform TCP fast retransmission 23 resend segment with sequence number y 24 restart timer for segment y 25 26 end of loop forever
TCPTCP with MODIFICATIONSwith MODIFICATIONS SENDER
Why wait for the timeout to expire when consecutive
ACKs can be used to indicate a lost segment
With Fast Retransmit
Transport Layer ndash TCP
21
BB
TCP ACK generation [RFC 1122 RFC 2581]
Event
in-order segment arrival no gapseverything else already ACKed
in-order segment arrival no gaps one delayed ACK pending (due to action 1)
out-of-order segment arrivalwith higher than expect seq - a gap is detected
arrival of segment that partially or completely fills gap
TCP Receiver action
Delay sending the ACK Wait up to 500msfor next segment If next segment does not arrive in this interval send ACK
immediately send a singlecumulative ACK
send duplicate ACK indicating seq of next expected byte
Immediately send an ACK if segment startsat lower end of gap
1
2
3
4
Receiver does not discard out-of-order segmentsReceiver does not discard out-of-order segments
Transport Layer ndash TCP
22
BB
TCP Interesting Scenarios
Host A
Seq=92 8 bytes data
ACK=100
loss
tim
eout
time lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
Host A
Seq=100 20 bytes data
ACK=100
Seq=
92
tim
eout
time premature timeoutcumulative ACKs
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
ACK=120
Retransmission due to lost ACKSegment with Seq=100 not retransmitted
Timer is restarted here for Seq=92
Simplified TCP versionSimplified TCP version
Transport Layer ndash TCP
23
BB
Host A
Seq=100 20 bytes data
ACK=100Seq=
92
tim
eout
time
Host B
ACK=120
Seq=92 8 bytes data
Xloss
Cumulative ACK avoids retransmission of the first segment
TCP Retransmission ScenarioTCP Retransmission Scenario
Transport Layer ndash TCP
24
BB
TCP Modifications Doubling the Timeout IntervalDoubling the Timeout Interval
Provides a limited form of congestion control
Timer expiration is more likely caused by congestion in the network
TimeoutInterval = 2 TimeoutIntervalPrevious
TCP acts more politely by increasing the TimeoutInterval causing the sender to retransmit after longer and longer intervals
Congestion may get worse if sources
continue to retransmit packets
persistently
After ACK is received After ACK is received TimeoutInterval is derived TimeoutInterval is derived from most recent EstimatedRTT from most recent EstimatedRTT and DevRTTand DevRTT
Others check RFC 2018 ndash selective ACK
Transport Layer ndash TCP
25
BB
receiverreceiver explicitly informs sender of (dynamically changing) amount of free buffer space RcvWindowRcvWindow field
in TCP segment
sendersender keeps the amount of transmitted unACKed data less thanless than most recently received RcvWindowRcvWindow
sender wont overrun
receivers buffer bytransmitting too
much too fast
flow controlflow control
receiver buffering
RcvBufferRcvBuffer = size of TCP Receive Buffer
RcvWindowRcvWindow = amount of spare room in Buffer
TCP Flow ControlTCP Flow Control
Transport Layer ndash TCP
26
BB
FLOW CONTROL FLOW CONTROL ReceiverReceiver
04060 50100
LastByteRcvd
EXAMPLE HOST A sends a large file to HOST B
LastByteRead
Application Process
Data from IP
RcvBuffer
RcvWindow=RcvBuffer-[LastByteRcvd-LastByteRead]
RECEIVER HOST B ndash uses RcvWindow LastByteRcvd LastByteReadRcvWindow LastByteRcvd LastByteRead
Initially RcvWindow = RcvBuffer Application reads from the bufferHOST BHOST B tells HOST AHOST A how much spare roomhow much spare room it has in the connection buffer by placing its current value of RcvWindowRcvWindow in the receive window field of every segment it sends to HOST AHOST A
Transport Layer ndash TCP
27
BB
FLOW CONTROL SenderSender
04060 50100
LastByteSent
EXAMPLE HOST A sends a large file to HOST B
LastByteACKed
Data
To ensure that HOST B does not overflow HOST A maintains throughout the connectionrsquos life that [LastByteSent-LastByteACKed] lt= RcvWindow
SENDER HOST A
ACKs ACKs from from Host Host BB
SENDER HOST A ndash uses RcvWindow RcvWindow of HostBof HostB LastByteSent LastByteACKed LastByteSent LastByteACKed
Transport Layer ndash TCP
28
BB
FLOW CONTROLFLOW CONTROLSome issue to consider
RcvWindowRcvWindow ndash used by the connection to provide the flow control serviceflow control service
TCPTCP requires that requires that HOST AHOST A continue to sendcontinue to send segments with one segments with one data byte when HOST Brsquos receive data byte when HOST Brsquos receive window is window is 00 Such segments will be Such segments will be ACKedACKed by HOST B by HOST B Eventually the buffer will have Eventually the buffer will have some space and the some space and the ACKsACKs will will contain contain RcvWindow gt 0RcvWindow gt 0
What happens when the receive
buffer of HOST B is full full (that is when (that is when
RcvWindow=0)RcvWindow=0)
TCP sends a segment only when there is datadata or ACKACK to send Therefore the sender must maintain the connection lsquoalivealiversquo
Transport Layer ndash TCP
29
BB
TCP Connection ManagementTCP Connection Management
Recall TCP sender receiver establish ldquoconnectionconnectionrdquo before exchanging data segments
Initialize TCP variables sequence numbers buffers flow control info (eg RcvWindow)
ClientClient is the connection initiator
In Java Socket clientSocket = new Socket(hostnameport number) connect ServerServer is contacted by client
In JavaSocket accept()
if (connectconnect(s (struct sockaddr )ampsin sizeof(sin)) = 0) printf(connect failedn) WSACleanup() exit(1)
ns = acceptaccept(s(struct sockaddr )(ampremoteaddr)ampaddrlen)
Transport Layer ndash TCP
30
BB
TCP Connection Management
Three way handshakeStep 1 clientclient end system sends TCP
SYNSYN control segment to server (executed by TCP itself)
specifies initial seq number (isn)
Step 2 serverserver end system receives SYNSYN replies with SYNACKSYNACK control segment
ACKs received SYN allocates buffers specifies serverrsquos initial seq
number
Step 3 clientclient ACKsACKs the connection with
ACK=server_isn +1ACK=server_isn +1
allocates buffers sends SYN=0SYN=0
Connection established
Client
AcceptAccept (SYN=1
seq=server_isnack=client_isn+1)
time
Server
ConnectConnect (SYN=1 seq=client_isn)
ACK (SYN=0
seq=client_isn+1ack=server_isn+1)
Establishing a connection
This is what happens when we create a socket for
connection to a server
After establishing the connection the client can receive segments with app-generated data (SYN=0)(SYN=0)
Transport Layer ndash TCP
31
BB
TCP Connection Management (cont)
Closing a connection
client closes socket
closesocket(s)closesocket(s)
Java clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
How TCP connection is established and torn down
Transport Layer ndash TCP
32
BB
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
Enters timed wait - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer ndash TCP
33
BB
TCP Connection Management (cont)
TCP client lifecycle
TCP server lifecycle
Used in case ACK gets lost It is implementation-dependent (eg 30
seconds 1 minute 2 minutes
Connection formally closes ndash all resources (eg port numbers) are
released
1
2
3
4
5
6
7
8
9
10
11
12
Transport Layer ndash TCP
34
BB
End of Flow Control and Error Control
Transport Layer ndash TCP
35
BB
Flow Control vs Congestion ControlFlow Control vs Congestion ControlSimilar actions are taken but for very different reasons
Flow ControlFlow Controlbull point-to-point traffic between sender and receiverbull speed matching service matching the rate at which the sender is sending against the rate at which the receiving application is readingbull prevents Receiver Buffer from overflowing
CongestionCongestion ndash happens when there are too many sources attempting to send data at too high a rate for the routers along the path
Congestion ControlCongestion Controlbull service that makes sure that the routers between End Systems are able to carry the offered trafficbull prevents routers from overflowing
Same course of action Throttling of the sender
Transport Layer ndash TCP
36
BB
CongestionCongestion Informally too many sources sending too
much data too fast for network to handle different from flow control Manifestations
lost packets (buffer overflow at routers) long delays (queuing in router buffers)
a top-10 problem
Principles of Congestion ControlPrinciples of Congestion Control
Transport Layer ndash TCP
37
BB
Approaches towards congestion control
End-to-end congestion control
no explicit feedback from network
congestion inferred by end-systems from observed packet loss amp delay
approach taken by TCP
Network-assisted congestion control
routers provide feedback to End Systems in the form of single bit indicating
link congestion (SNA DECbit TCPIP ECN ATM ABR)
explicit transmission rate the sender should send at
1 2
Two broad approaches towards congestion control
Transport Layer ndash TCP
38
BB
TCP Congestion ControlTCP Congestion ControlHow TCP sender limits the rate at which it sends traffic into its connection
SENDER
(Amount of unACKed data)SENDER lt min(CongWin RcvWindow)
LastByteSent - LastByteACKed
Indirectly limits the senderrsquos send rateAssumptions
bull TCP receive buffer is very large ndash no RcvWindow constraint Amt of unACKed data at sender is solely limited by CongWin
bull Packet loss delay amp packet transmission delay are negligible
Sending rate (approx) CongWinRTT
By adjusting CongWin sender can therefore adjust the
rate at which it sends data into its connection
New variable ndash Congestion
Window
Transport Layer ndash TCP
39
BB
TCP Congestion ControlTCP Congestion ControlTCP uses ACKs to trigger (ldquoclockrdquo) its increase in congestion window size ndash ldquoself-clockingrdquo
Arrival of ACKs ndash indication to the sender that all is well
1 Slow Rate
bull Congestion window will be increased at a relatively slow rate
2 High rate
bull Congestion window will be increased more quickly
Transport Layer ndash TCP
40
BB
TCP Congestion ControlTCP Congestion ControlHow TCP perceives that there is congestion on the path
ldquoLoss Eventrdquo ndash when there is excessive congestion router buffers along the path overflows causing datagrams to be dropped which in turn results in a ldquoloss eventrdquo at the sender
1 Timeout
bull no ACK is received after segment loss
2 Receipt of three duplicate ACKs
bull segment loss is followed by three ACKs received at the sender
Transport Layer ndash TCP
41
BB
TCP Congestion Control details
sender limits transmission LastByteSent-LastByteAcked
cwnd
roughly
cwnd is dynamic function of perceived network congestion
How does sender perceive congestion
loss event = timeout or 3 duplicate acks
TCP sender reduces rate (cwnd) after loss event
Three mechanisms1 AIMD
2 slow start
3 conservative after timeout events
rate = cwnd
RTT Bytessec
Transport Layer ndash TCP
42
BB
TCP congestion avoidance congestion avoidance additive increase multiplicative decreaseadditive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every RTT until
loss is detected multiplicative decrease cut cwnd in half after loss
timecwnd
co
nge
stio
n w
indo
w s
ize
saw toothbehavior probingfor bandwidth
Transport Layer ndash TCP
43
BB
TCP Slow Start Slow Start when connection begins
increase rate exponentially until first loss event initially cwndcwnd = 1 MSS double cwndcwnd every RTT done by incrementing cwndcwnd by 1 MSS by 1 MSS for every ACK received
summary initial rate is slow but ramps up exponentially fast (doubling of the sending rate every RTT)
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transport Layer ndash TCP
44
BB
Refinement inferring lossRefinement inferring loss
after 3 dup ACKsafter 3 dup ACKs cwndcwnd is cut in halfhalf window then grows
linearlylinearly butbut after timeout after timeout
eventevent cwndcwnd is is set to 1 MSS1 MSS window then grows
exponentiallyexponentially Up to a thresholdUp to a threshold
then grows linearlylinearly
3 dup ACKs 3 dup ACKs indicates network capable of delivering some segments
timeouttimeout indicates a ldquomore alarmingrdquo congestion scenario
Philosophy
Transport Layer ndash TCP
45
BB
Refinement
Q when should the exponential increase switch to linear
A when cwndcwnd gets to 1212 of its value before timeout
Implementation variable ssthresh (slow-start threshold)ssthresh (slow-start threshold) on loss eventon loss event ssthreshssthresh is set to 1212 of cwndcwnd just
before loss event
Transport Layer ndash TCP
46
BB
TCP Sender Congestion ControlSTATE EVENT TCP SENDER Congestion-
Control ActionCommentary
SLOW START (SS)
ACK receipt for previously unACKed data
CongWin = CongWin + MSSIf(CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo
Resulting in a doubling of CongWin every RTT
Congestion Avoidance (CA)
ACK receipt for previously unACKed data
CongWin = CongWin + MSS (MSSCongWin)
Additive increase resulting in increasing of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate ACK
Threshold = CongWin 2CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
SS or CA Timeout Threshold = CongWin 2CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter Slow Start
SS or CA Duplicate ACK Increment duplicate ACK count for segment being ACKed
CongWin and Threshold not changed
Transport Layer ndash TCP
47
BB
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3 MSSretransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++
duplicate ACK
cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
Transport Layer ndash TCP
48
BB
Congestion control
TCPrsquos Congestion Control Service
CLIENTSERVER
Problem Gridlock sets-in when there is packet loss due to router congestion
forces the End Systems to decrease the rate at which packets are sentdecrease the rate at which packets are sent during periods of congestion
The sending systemrsquos packet is lost due to congestion and is alerted when it stops receiving ACKs of packets sent
Transport Layer ndash TCP
49
BBTransport Layer 3-49
Macroscopic Description of TCP throughput
whatrsquos the average throughout of TCP as a function of window size and RTT ignore slow start (typically very short phases)
let W be the window size when loss occurs when window is W throughput is WRTT just after loss window drops to W2 throughput to
W2RTT Throughput increases linearly (by MSSRTT every
RTT) Average Throughput 75 WRTT
(Based on Idealised model for the steady-state dynamics of TCP)
Transport Layer 3-50
TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired
throughput of 10 Gbps requires window size W = 83333 in-flight
segments Throughput in terms of loss rate
L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)
new versions of TCP is needed for high-speed environments
LRTT
MSS221
Transport Layer ndash TCP
51
BB
TCP FairnessTCP Fairness
Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth
TCP connection 1
bottleneckrouter
capacity RR
TCP connection 2
Go to Summary of TCP Congestion Control
Transport Layer ndash TCP
52
BB
Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link
Link with transmission rate of RR
Each connection have the same MSSMSS RTTRTT
No other TCP connections or UDP datagrams traverse the shared link
Ignore slow start phase of TCP
Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)
Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing
AssumptionsAssumptions
Transport Layer ndash TCP
53
BB
Why is TCP fairWhy is TCP fair
Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput
proportionally
RR
equal bandwidth shareequal bandwidth share
Connection 1 throughput
Connect
ion 2
thro
ughput
congestion avoidance additive increaseloss decrease window by factor of 2
RR
A point on the graph depicts the amount of link bandwidth
jointly jointly consumedconsumed by
the connections
Full bandwidth utilisation line
We can viewWe can view a simulation a simulation
on thison this
View SimulationView Simulation
Transport Layer ndash TCP
54
BB
The End
The next succeeding slides are just for additional reading
Transport Layer ndash TCP
55
BB
TCP Latency ModelingTCP Latency Modeling
In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs
Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth
Multiple End Systems sharing a link
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
Loop holes in TCPLoop holes in TCP
1 TCP connection
1 TCP connection
1 TCP connection
3 TCP connections
Multithreading implementationMultithreading implementation
Transport Layer ndash TCP
56
BB
TCP latency modelingTCP latency modeling
TCP connection establishment time data transfer delay Actual data transmission time
Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent
QQ How long How long does it take to does it take to
receive an object receive an object from a Web from a Web
serverserver
No data transfer delayNo data transfer delay
Therersquos data transfer delayTherersquos data transfer delay
the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety
Transport Layer ndash TCP
57
BB
TCP Latency ModelingTCP Latency Modeling
Network is uncongested with one link between end systems of rate RR
CongWinCongWin (fixed) determines the amount of data that can be sent
No packet loss no packet corruption no retransmissions required
Header overheads are negligible
File to sendFile to send = integer number of segments of size MSS
Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times
CLIENT SERVER
FILEFILE
Initial ThresholdThreshold of TCP congestion mechanism is very big
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits
SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)
Transport Layer ndash TCP
58
BB
TCP latency ModelingTCP latency Modeling
Case 1 latency = 2RTT + OR
K = Number of Windows of data that cover the object
K = OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
Number of segmentsRounded up to the nearest integer
Assume W=4 segmentsAssume W=4 segments
eg O=256bits S=32bits W=4
Transport Layer ndash TCP
59
BB
TCP latency ModelingTCP latency Modeling
Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]
Number of Windows of data that cover the objectK= OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR
Sender has to wait for an ACK after a windowrsquos worth of data sent
STALLED STALLED PERIODPERIOD
If there are k windows sender willbe stalled (k-1) times
Transport Layer ndash TCP
60
BB
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
STALLED STALLED PERIODPERIOD
4 windows4 windows
OS=15
Transport Layer ndash TCP
61
BB
bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the
object as followsobject as follows
S
OkK k 110 222min
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1log
1logmin
12min
2
2
S
OK
S
OkkK
S
OkK k
Note
Transport Layer ndash TCP
62
BB
bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window
bull Transmission of Transmission of kkth window window ==
bull Stall Time Stall Time ==
bull Latency Latency ==
12
k
R
S
12
k
R
SRTT
R
S
RTTR
S
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1K
1
122k
k
R
SRTT
R
S
R
ORTT
Transport Layer ndash TCP
63
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
Transport Layer ndash TCP
64
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
R
S
R
SRTTP
R
ORTT P )12(2Latency
bull Closed-form expression for the latency
Transport Layer ndash TCP
65
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
2
1
RTTRO
P
encyMinimumLat
Latency
Slow start will not significantly increase latency if RTT ltlt OR
Transport Layer ndash TCP
66
BB
httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features
Transport Layer ndash TCP
21
BB
TCP ACK generation [RFC 1122 RFC 2581]
Event
in-order segment arrival no gapseverything else already ACKed
in-order segment arrival no gaps one delayed ACK pending (due to action 1)
out-of-order segment arrivalwith higher than expect seq - a gap is detected
arrival of segment that partially or completely fills gap
TCP Receiver action
Delay sending the ACK Wait up to 500msfor next segment If next segment does not arrive in this interval send ACK
immediately send a singlecumulative ACK
send duplicate ACK indicating seq of next expected byte
Immediately send an ACK if segment startsat lower end of gap
1
2
3
4
Receiver does not discard out-of-order segmentsReceiver does not discard out-of-order segments
Transport Layer ndash TCP
22
BB
TCP Interesting Scenarios
Host A
Seq=92 8 bytes data
ACK=100
loss
tim
eout
time lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
Host A
Seq=100 20 bytes data
ACK=100
Seq=
92
tim
eout
time premature timeoutcumulative ACKs
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
ACK=120
Retransmission due to lost ACKSegment with Seq=100 not retransmitted
Timer is restarted here for Seq=92
Simplified TCP versionSimplified TCP version
Transport Layer ndash TCP
23
BB
Host A
Seq=100 20 bytes data
ACK=100Seq=
92
tim
eout
time
Host B
ACK=120
Seq=92 8 bytes data
Xloss
Cumulative ACK avoids retransmission of the first segment
TCP Retransmission ScenarioTCP Retransmission Scenario
Transport Layer ndash TCP
24
BB
TCP Modifications Doubling the Timeout IntervalDoubling the Timeout Interval
Provides a limited form of congestion control
Timer expiration is more likely caused by congestion in the network
TimeoutInterval = 2 TimeoutIntervalPrevious
TCP acts more politely by increasing the TimeoutInterval causing the sender to retransmit after longer and longer intervals
Congestion may get worse if sources
continue to retransmit packets
persistently
After ACK is received After ACK is received TimeoutInterval is derived TimeoutInterval is derived from most recent EstimatedRTT from most recent EstimatedRTT and DevRTTand DevRTT
Others check RFC 2018 ndash selective ACK
Transport Layer ndash TCP
25
BB
receiverreceiver explicitly informs sender of (dynamically changing) amount of free buffer space RcvWindowRcvWindow field
in TCP segment
sendersender keeps the amount of transmitted unACKed data less thanless than most recently received RcvWindowRcvWindow
sender wont overrun
receivers buffer bytransmitting too
much too fast
flow controlflow control
receiver buffering
RcvBufferRcvBuffer = size of TCP Receive Buffer
RcvWindowRcvWindow = amount of spare room in Buffer
TCP Flow ControlTCP Flow Control
Transport Layer ndash TCP
26
BB
FLOW CONTROL FLOW CONTROL ReceiverReceiver
04060 50100
LastByteRcvd
EXAMPLE HOST A sends a large file to HOST B
LastByteRead
Application Process
Data from IP
RcvBuffer
RcvWindow=RcvBuffer-[LastByteRcvd-LastByteRead]
RECEIVER HOST B ndash uses RcvWindow LastByteRcvd LastByteReadRcvWindow LastByteRcvd LastByteRead
Initially RcvWindow = RcvBuffer Application reads from the bufferHOST BHOST B tells HOST AHOST A how much spare roomhow much spare room it has in the connection buffer by placing its current value of RcvWindowRcvWindow in the receive window field of every segment it sends to HOST AHOST A
Transport Layer ndash TCP
27
BB
FLOW CONTROL SenderSender
04060 50100
LastByteSent
EXAMPLE HOST A sends a large file to HOST B
LastByteACKed
Data
To ensure that HOST B does not overflow HOST A maintains throughout the connectionrsquos life that [LastByteSent-LastByteACKed] lt= RcvWindow
SENDER HOST A
ACKs ACKs from from Host Host BB
SENDER HOST A ndash uses RcvWindow RcvWindow of HostBof HostB LastByteSent LastByteACKed LastByteSent LastByteACKed
Transport Layer ndash TCP
28
BB
FLOW CONTROLFLOW CONTROLSome issue to consider
RcvWindowRcvWindow ndash used by the connection to provide the flow control serviceflow control service
TCPTCP requires that requires that HOST AHOST A continue to sendcontinue to send segments with one segments with one data byte when HOST Brsquos receive data byte when HOST Brsquos receive window is window is 00 Such segments will be Such segments will be ACKedACKed by HOST B by HOST B Eventually the buffer will have Eventually the buffer will have some space and the some space and the ACKsACKs will will contain contain RcvWindow gt 0RcvWindow gt 0
What happens when the receive
buffer of HOST B is full full (that is when (that is when
RcvWindow=0)RcvWindow=0)
TCP sends a segment only when there is datadata or ACKACK to send Therefore the sender must maintain the connection lsquoalivealiversquo
Transport Layer ndash TCP
29
BB
TCP Connection ManagementTCP Connection Management
Recall TCP sender receiver establish ldquoconnectionconnectionrdquo before exchanging data segments
Initialize TCP variables sequence numbers buffers flow control info (eg RcvWindow)
ClientClient is the connection initiator
In Java Socket clientSocket = new Socket(hostnameport number) connect ServerServer is contacted by client
In JavaSocket accept()
if (connectconnect(s (struct sockaddr )ampsin sizeof(sin)) = 0) printf(connect failedn) WSACleanup() exit(1)
ns = acceptaccept(s(struct sockaddr )(ampremoteaddr)ampaddrlen)
Transport Layer ndash TCP
30
BB
TCP Connection Management
Three way handshakeStep 1 clientclient end system sends TCP
SYNSYN control segment to server (executed by TCP itself)
specifies initial seq number (isn)
Step 2 serverserver end system receives SYNSYN replies with SYNACKSYNACK control segment
ACKs received SYN allocates buffers specifies serverrsquos initial seq
number
Step 3 clientclient ACKsACKs the connection with
ACK=server_isn +1ACK=server_isn +1
allocates buffers sends SYN=0SYN=0
Connection established
Client
AcceptAccept (SYN=1
seq=server_isnack=client_isn+1)
time
Server
ConnectConnect (SYN=1 seq=client_isn)
ACK (SYN=0
seq=client_isn+1ack=server_isn+1)
Establishing a connection
This is what happens when we create a socket for
connection to a server
After establishing the connection the client can receive segments with app-generated data (SYN=0)(SYN=0)
Transport Layer ndash TCP
31
BB
TCP Connection Management (cont)
Closing a connection
client closes socket
closesocket(s)closesocket(s)
Java clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
How TCP connection is established and torn down
Transport Layer ndash TCP
32
BB
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
Enters timed wait - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer ndash TCP
33
BB
TCP Connection Management (cont)
TCP client lifecycle
TCP server lifecycle
Used in case ACK gets lost It is implementation-dependent (eg 30
seconds 1 minute 2 minutes
Connection formally closes ndash all resources (eg port numbers) are
released
1
2
3
4
5
6
7
8
9
10
11
12
Transport Layer ndash TCP
34
BB
End of Flow Control and Error Control
Transport Layer ndash TCP
35
BB
Flow Control vs Congestion ControlFlow Control vs Congestion ControlSimilar actions are taken but for very different reasons
Flow ControlFlow Controlbull point-to-point traffic between sender and receiverbull speed matching service matching the rate at which the sender is sending against the rate at which the receiving application is readingbull prevents Receiver Buffer from overflowing
CongestionCongestion ndash happens when there are too many sources attempting to send data at too high a rate for the routers along the path
Congestion ControlCongestion Controlbull service that makes sure that the routers between End Systems are able to carry the offered trafficbull prevents routers from overflowing
Same course of action Throttling of the sender
Transport Layer ndash TCP
36
BB
CongestionCongestion Informally too many sources sending too
much data too fast for network to handle different from flow control Manifestations
lost packets (buffer overflow at routers) long delays (queuing in router buffers)
a top-10 problem
Principles of Congestion ControlPrinciples of Congestion Control
Transport Layer ndash TCP
37
BB
Approaches towards congestion control
End-to-end congestion control
no explicit feedback from network
congestion inferred by end-systems from observed packet loss amp delay
approach taken by TCP
Network-assisted congestion control
routers provide feedback to End Systems in the form of single bit indicating
link congestion (SNA DECbit TCPIP ECN ATM ABR)
explicit transmission rate the sender should send at
1 2
Two broad approaches towards congestion control
Transport Layer ndash TCP
38
BB
TCP Congestion ControlTCP Congestion ControlHow TCP sender limits the rate at which it sends traffic into its connection
SENDER
(Amount of unACKed data)SENDER lt min(CongWin RcvWindow)
LastByteSent - LastByteACKed
Indirectly limits the senderrsquos send rateAssumptions
bull TCP receive buffer is very large ndash no RcvWindow constraint Amt of unACKed data at sender is solely limited by CongWin
bull Packet loss delay amp packet transmission delay are negligible
Sending rate (approx) CongWinRTT
By adjusting CongWin sender can therefore adjust the
rate at which it sends data into its connection
New variable ndash Congestion
Window
Transport Layer ndash TCP
39
BB
TCP Congestion ControlTCP Congestion ControlTCP uses ACKs to trigger (ldquoclockrdquo) its increase in congestion window size ndash ldquoself-clockingrdquo
Arrival of ACKs ndash indication to the sender that all is well
1 Slow Rate
bull Congestion window will be increased at a relatively slow rate
2 High rate
bull Congestion window will be increased more quickly
Transport Layer ndash TCP
40
BB
TCP Congestion ControlTCP Congestion ControlHow TCP perceives that there is congestion on the path
ldquoLoss Eventrdquo ndash when there is excessive congestion router buffers along the path overflows causing datagrams to be dropped which in turn results in a ldquoloss eventrdquo at the sender
1 Timeout
bull no ACK is received after segment loss
2 Receipt of three duplicate ACKs
bull segment loss is followed by three ACKs received at the sender
Transport Layer ndash TCP
41
BB
TCP Congestion Control details
sender limits transmission LastByteSent-LastByteAcked
cwnd
roughly
cwnd is dynamic function of perceived network congestion
How does sender perceive congestion
loss event = timeout or 3 duplicate acks
TCP sender reduces rate (cwnd) after loss event
Three mechanisms1 AIMD
2 slow start
3 conservative after timeout events
rate = cwnd
RTT Bytessec
Transport Layer ndash TCP
42
BB
TCP congestion avoidance congestion avoidance additive increase multiplicative decreaseadditive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every RTT until
loss is detected multiplicative decrease cut cwnd in half after loss
timecwnd
co
nge
stio
n w
indo
w s
ize
saw toothbehavior probingfor bandwidth
Transport Layer ndash TCP
43
BB
TCP Slow Start Slow Start when connection begins
increase rate exponentially until first loss event initially cwndcwnd = 1 MSS double cwndcwnd every RTT done by incrementing cwndcwnd by 1 MSS by 1 MSS for every ACK received
summary initial rate is slow but ramps up exponentially fast (doubling of the sending rate every RTT)
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transport Layer ndash TCP
44
BB
Refinement inferring lossRefinement inferring loss
after 3 dup ACKsafter 3 dup ACKs cwndcwnd is cut in halfhalf window then grows
linearlylinearly butbut after timeout after timeout
eventevent cwndcwnd is is set to 1 MSS1 MSS window then grows
exponentiallyexponentially Up to a thresholdUp to a threshold
then grows linearlylinearly
3 dup ACKs 3 dup ACKs indicates network capable of delivering some segments
timeouttimeout indicates a ldquomore alarmingrdquo congestion scenario
Philosophy
Transport Layer ndash TCP
45
BB
Refinement
Q when should the exponential increase switch to linear
A when cwndcwnd gets to 1212 of its value before timeout
Implementation variable ssthresh (slow-start threshold)ssthresh (slow-start threshold) on loss eventon loss event ssthreshssthresh is set to 1212 of cwndcwnd just
before loss event
Transport Layer ndash TCP
46
BB
TCP Sender Congestion ControlSTATE EVENT TCP SENDER Congestion-
Control ActionCommentary
SLOW START (SS)
ACK receipt for previously unACKed data
CongWin = CongWin + MSSIf(CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo
Resulting in a doubling of CongWin every RTT
Congestion Avoidance (CA)
ACK receipt for previously unACKed data
CongWin = CongWin + MSS (MSSCongWin)
Additive increase resulting in increasing of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate ACK
Threshold = CongWin 2CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
SS or CA Timeout Threshold = CongWin 2CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter Slow Start
SS or CA Duplicate ACK Increment duplicate ACK count for segment being ACKed
CongWin and Threshold not changed
Transport Layer ndash TCP
47
BB
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3 MSSretransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++
duplicate ACK
cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
Transport Layer ndash TCP
48
BB
Congestion control
TCPrsquos Congestion Control Service
CLIENTSERVER
Problem Gridlock sets-in when there is packet loss due to router congestion
forces the End Systems to decrease the rate at which packets are sentdecrease the rate at which packets are sent during periods of congestion
The sending systemrsquos packet is lost due to congestion and is alerted when it stops receiving ACKs of packets sent
Transport Layer ndash TCP
49
BBTransport Layer 3-49
Macroscopic Description of TCP throughput
whatrsquos the average throughout of TCP as a function of window size and RTT ignore slow start (typically very short phases)
let W be the window size when loss occurs when window is W throughput is WRTT just after loss window drops to W2 throughput to
W2RTT Throughput increases linearly (by MSSRTT every
RTT) Average Throughput 75 WRTT
(Based on Idealised model for the steady-state dynamics of TCP)
Transport Layer 3-50
TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired
throughput of 10 Gbps requires window size W = 83333 in-flight
segments Throughput in terms of loss rate
L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)
new versions of TCP is needed for high-speed environments
LRTT
MSS221
Transport Layer ndash TCP
51
BB
TCP FairnessTCP Fairness
Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth
TCP connection 1
bottleneckrouter
capacity RR
TCP connection 2
Go to Summary of TCP Congestion Control
Transport Layer ndash TCP
52
BB
Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link
Link with transmission rate of RR
Each connection have the same MSSMSS RTTRTT
No other TCP connections or UDP datagrams traverse the shared link
Ignore slow start phase of TCP
Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)
Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing
AssumptionsAssumptions
Transport Layer ndash TCP
53
BB
Why is TCP fairWhy is TCP fair
Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput
proportionally
RR
equal bandwidth shareequal bandwidth share
Connection 1 throughput
Connect
ion 2
thro
ughput
congestion avoidance additive increaseloss decrease window by factor of 2
RR
A point on the graph depicts the amount of link bandwidth
jointly jointly consumedconsumed by
the connections
Full bandwidth utilisation line
We can viewWe can view a simulation a simulation
on thison this
View SimulationView Simulation
Transport Layer ndash TCP
54
BB
The End
The next succeeding slides are just for additional reading
Transport Layer ndash TCP
55
BB
TCP Latency ModelingTCP Latency Modeling
In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs
Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth
Multiple End Systems sharing a link
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
Loop holes in TCPLoop holes in TCP
1 TCP connection
1 TCP connection
1 TCP connection
3 TCP connections
Multithreading implementationMultithreading implementation
Transport Layer ndash TCP
56
BB
TCP latency modelingTCP latency modeling
TCP connection establishment time data transfer delay Actual data transmission time
Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent
QQ How long How long does it take to does it take to
receive an object receive an object from a Web from a Web
serverserver
No data transfer delayNo data transfer delay
Therersquos data transfer delayTherersquos data transfer delay
the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety
Transport Layer ndash TCP
57
BB
TCP Latency ModelingTCP Latency Modeling
Network is uncongested with one link between end systems of rate RR
CongWinCongWin (fixed) determines the amount of data that can be sent
No packet loss no packet corruption no retransmissions required
Header overheads are negligible
File to sendFile to send = integer number of segments of size MSS
Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times
CLIENT SERVER
FILEFILE
Initial ThresholdThreshold of TCP congestion mechanism is very big
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits
SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)
Transport Layer ndash TCP
58
BB
TCP latency ModelingTCP latency Modeling
Case 1 latency = 2RTT + OR
K = Number of Windows of data that cover the object
K = OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
Number of segmentsRounded up to the nearest integer
Assume W=4 segmentsAssume W=4 segments
eg O=256bits S=32bits W=4
Transport Layer ndash TCP
59
BB
TCP latency ModelingTCP latency Modeling
Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]
Number of Windows of data that cover the objectK= OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR
Sender has to wait for an ACK after a windowrsquos worth of data sent
STALLED STALLED PERIODPERIOD
If there are k windows sender willbe stalled (k-1) times
Transport Layer ndash TCP
60
BB
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
STALLED STALLED PERIODPERIOD
4 windows4 windows
OS=15
Transport Layer ndash TCP
61
BB
bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the
object as followsobject as follows
S
OkK k 110 222min
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1log
1logmin
12min
2
2
S
OK
S
OkkK
S
OkK k
Note
Transport Layer ndash TCP
62
BB
bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window
bull Transmission of Transmission of kkth window window ==
bull Stall Time Stall Time ==
bull Latency Latency ==
12
k
R
S
12
k
R
SRTT
R
S
RTTR
S
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1K
1
122k
k
R
SRTT
R
S
R
ORTT
Transport Layer ndash TCP
63
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
Transport Layer ndash TCP
64
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
R
S
R
SRTTP
R
ORTT P )12(2Latency
bull Closed-form expression for the latency
Transport Layer ndash TCP
65
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
2
1
RTTRO
P
encyMinimumLat
Latency
Slow start will not significantly increase latency if RTT ltlt OR
Transport Layer ndash TCP
66
BB
httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features
Transport Layer ndash TCP
22
BB
TCP Interesting Scenarios
Host A
Seq=92 8 bytes data
ACK=100
loss
tim
eout
time lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
Host A
Seq=100 20 bytes data
ACK=100
Seq=
92
tim
eout
time premature timeoutcumulative ACKs
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
ACK=120
Retransmission due to lost ACKSegment with Seq=100 not retransmitted
Timer is restarted here for Seq=92
Simplified TCP versionSimplified TCP version
Transport Layer ndash TCP
23
BB
Host A
Seq=100 20 bytes data
ACK=100Seq=
92
tim
eout
time
Host B
ACK=120
Seq=92 8 bytes data
Xloss
Cumulative ACK avoids retransmission of the first segment
TCP Retransmission ScenarioTCP Retransmission Scenario
Transport Layer ndash TCP
24
BB
TCP Modifications Doubling the Timeout IntervalDoubling the Timeout Interval
Provides a limited form of congestion control
Timer expiration is more likely caused by congestion in the network
TimeoutInterval = 2 TimeoutIntervalPrevious
TCP acts more politely by increasing the TimeoutInterval causing the sender to retransmit after longer and longer intervals
Congestion may get worse if sources
continue to retransmit packets
persistently
After ACK is received After ACK is received TimeoutInterval is derived TimeoutInterval is derived from most recent EstimatedRTT from most recent EstimatedRTT and DevRTTand DevRTT
Others check RFC 2018 ndash selective ACK
Transport Layer ndash TCP
25
BB
receiverreceiver explicitly informs sender of (dynamically changing) amount of free buffer space RcvWindowRcvWindow field
in TCP segment
sendersender keeps the amount of transmitted unACKed data less thanless than most recently received RcvWindowRcvWindow
sender wont overrun
receivers buffer bytransmitting too
much too fast
flow controlflow control
receiver buffering
RcvBufferRcvBuffer = size of TCP Receive Buffer
RcvWindowRcvWindow = amount of spare room in Buffer
TCP Flow ControlTCP Flow Control
Transport Layer ndash TCP
26
BB
FLOW CONTROL FLOW CONTROL ReceiverReceiver
04060 50100
LastByteRcvd
EXAMPLE HOST A sends a large file to HOST B
LastByteRead
Application Process
Data from IP
RcvBuffer
RcvWindow=RcvBuffer-[LastByteRcvd-LastByteRead]
RECEIVER HOST B ndash uses RcvWindow LastByteRcvd LastByteReadRcvWindow LastByteRcvd LastByteRead
Initially RcvWindow = RcvBuffer Application reads from the bufferHOST BHOST B tells HOST AHOST A how much spare roomhow much spare room it has in the connection buffer by placing its current value of RcvWindowRcvWindow in the receive window field of every segment it sends to HOST AHOST A
Transport Layer ndash TCP
27
BB
FLOW CONTROL SenderSender
04060 50100
LastByteSent
EXAMPLE HOST A sends a large file to HOST B
LastByteACKed
Data
To ensure that HOST B does not overflow HOST A maintains throughout the connectionrsquos life that [LastByteSent-LastByteACKed] lt= RcvWindow
SENDER HOST A
ACKs ACKs from from Host Host BB
SENDER HOST A ndash uses RcvWindow RcvWindow of HostBof HostB LastByteSent LastByteACKed LastByteSent LastByteACKed
Transport Layer ndash TCP
28
BB
FLOW CONTROLFLOW CONTROLSome issue to consider
RcvWindowRcvWindow ndash used by the connection to provide the flow control serviceflow control service
TCPTCP requires that requires that HOST AHOST A continue to sendcontinue to send segments with one segments with one data byte when HOST Brsquos receive data byte when HOST Brsquos receive window is window is 00 Such segments will be Such segments will be ACKedACKed by HOST B by HOST B Eventually the buffer will have Eventually the buffer will have some space and the some space and the ACKsACKs will will contain contain RcvWindow gt 0RcvWindow gt 0
What happens when the receive
buffer of HOST B is full full (that is when (that is when
RcvWindow=0)RcvWindow=0)
TCP sends a segment only when there is datadata or ACKACK to send Therefore the sender must maintain the connection lsquoalivealiversquo
Transport Layer ndash TCP
29
BB
TCP Connection ManagementTCP Connection Management
Recall TCP sender receiver establish ldquoconnectionconnectionrdquo before exchanging data segments
Initialize TCP variables sequence numbers buffers flow control info (eg RcvWindow)
ClientClient is the connection initiator
In Java Socket clientSocket = new Socket(hostnameport number) connect ServerServer is contacted by client
In JavaSocket accept()
if (connectconnect(s (struct sockaddr )ampsin sizeof(sin)) = 0) printf(connect failedn) WSACleanup() exit(1)
ns = acceptaccept(s(struct sockaddr )(ampremoteaddr)ampaddrlen)
Transport Layer ndash TCP
30
BB
TCP Connection Management
Three way handshakeStep 1 clientclient end system sends TCP
SYNSYN control segment to server (executed by TCP itself)
specifies initial seq number (isn)
Step 2 serverserver end system receives SYNSYN replies with SYNACKSYNACK control segment
ACKs received SYN allocates buffers specifies serverrsquos initial seq
number
Step 3 clientclient ACKsACKs the connection with
ACK=server_isn +1ACK=server_isn +1
allocates buffers sends SYN=0SYN=0
Connection established
Client
AcceptAccept (SYN=1
seq=server_isnack=client_isn+1)
time
Server
ConnectConnect (SYN=1 seq=client_isn)
ACK (SYN=0
seq=client_isn+1ack=server_isn+1)
Establishing a connection
This is what happens when we create a socket for
connection to a server
After establishing the connection the client can receive segments with app-generated data (SYN=0)(SYN=0)
Transport Layer ndash TCP
31
BB
TCP Connection Management (cont)
Closing a connection
client closes socket
closesocket(s)closesocket(s)
Java clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
How TCP connection is established and torn down
Transport Layer ndash TCP
32
BB
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
Enters timed wait - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer ndash TCP
33
BB
TCP Connection Management (cont)
TCP client lifecycle
TCP server lifecycle
Used in case ACK gets lost It is implementation-dependent (eg 30
seconds 1 minute 2 minutes
Connection formally closes ndash all resources (eg port numbers) are
released
1
2
3
4
5
6
7
8
9
10
11
12
Transport Layer ndash TCP
34
BB
End of Flow Control and Error Control
Transport Layer ndash TCP
35
BB
Flow Control vs Congestion ControlFlow Control vs Congestion ControlSimilar actions are taken but for very different reasons
Flow ControlFlow Controlbull point-to-point traffic between sender and receiverbull speed matching service matching the rate at which the sender is sending against the rate at which the receiving application is readingbull prevents Receiver Buffer from overflowing
CongestionCongestion ndash happens when there are too many sources attempting to send data at too high a rate for the routers along the path
Congestion ControlCongestion Controlbull service that makes sure that the routers between End Systems are able to carry the offered trafficbull prevents routers from overflowing
Same course of action Throttling of the sender
Transport Layer ndash TCP
36
BB
CongestionCongestion Informally too many sources sending too
much data too fast for network to handle different from flow control Manifestations
lost packets (buffer overflow at routers) long delays (queuing in router buffers)
a top-10 problem
Principles of Congestion ControlPrinciples of Congestion Control
Transport Layer ndash TCP
37
BB
Approaches towards congestion control
End-to-end congestion control
no explicit feedback from network
congestion inferred by end-systems from observed packet loss amp delay
approach taken by TCP
Network-assisted congestion control
routers provide feedback to End Systems in the form of single bit indicating
link congestion (SNA DECbit TCPIP ECN ATM ABR)
explicit transmission rate the sender should send at
1 2
Two broad approaches towards congestion control
Transport Layer ndash TCP
38
BB
TCP Congestion ControlTCP Congestion ControlHow TCP sender limits the rate at which it sends traffic into its connection
SENDER
(Amount of unACKed data)SENDER lt min(CongWin RcvWindow)
LastByteSent - LastByteACKed
Indirectly limits the senderrsquos send rateAssumptions
bull TCP receive buffer is very large ndash no RcvWindow constraint Amt of unACKed data at sender is solely limited by CongWin
bull Packet loss delay amp packet transmission delay are negligible
Sending rate (approx) CongWinRTT
By adjusting CongWin sender can therefore adjust the
rate at which it sends data into its connection
New variable ndash Congestion
Window
Transport Layer ndash TCP
39
BB
TCP Congestion ControlTCP Congestion ControlTCP uses ACKs to trigger (ldquoclockrdquo) its increase in congestion window size ndash ldquoself-clockingrdquo
Arrival of ACKs ndash indication to the sender that all is well
1 Slow Rate
bull Congestion window will be increased at a relatively slow rate
2 High rate
bull Congestion window will be increased more quickly
Transport Layer ndash TCP
40
BB
TCP Congestion ControlTCP Congestion ControlHow TCP perceives that there is congestion on the path
ldquoLoss Eventrdquo ndash when there is excessive congestion router buffers along the path overflows causing datagrams to be dropped which in turn results in a ldquoloss eventrdquo at the sender
1 Timeout
bull no ACK is received after segment loss
2 Receipt of three duplicate ACKs
bull segment loss is followed by three ACKs received at the sender
Transport Layer ndash TCP
41
BB
TCP Congestion Control details
sender limits transmission LastByteSent-LastByteAcked
cwnd
roughly
cwnd is dynamic function of perceived network congestion
How does sender perceive congestion
loss event = timeout or 3 duplicate acks
TCP sender reduces rate (cwnd) after loss event
Three mechanisms1 AIMD
2 slow start
3 conservative after timeout events
rate = cwnd
RTT Bytessec
Transport Layer ndash TCP
42
BB
TCP congestion avoidance congestion avoidance additive increase multiplicative decreaseadditive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every RTT until
loss is detected multiplicative decrease cut cwnd in half after loss
timecwnd
co
nge
stio
n w
indo
w s
ize
saw toothbehavior probingfor bandwidth
Transport Layer ndash TCP
43
BB
TCP Slow Start Slow Start when connection begins
increase rate exponentially until first loss event initially cwndcwnd = 1 MSS double cwndcwnd every RTT done by incrementing cwndcwnd by 1 MSS by 1 MSS for every ACK received
summary initial rate is slow but ramps up exponentially fast (doubling of the sending rate every RTT)
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transport Layer ndash TCP
44
BB
Refinement inferring lossRefinement inferring loss
after 3 dup ACKsafter 3 dup ACKs cwndcwnd is cut in halfhalf window then grows
linearlylinearly butbut after timeout after timeout
eventevent cwndcwnd is is set to 1 MSS1 MSS window then grows
exponentiallyexponentially Up to a thresholdUp to a threshold
then grows linearlylinearly
3 dup ACKs 3 dup ACKs indicates network capable of delivering some segments
timeouttimeout indicates a ldquomore alarmingrdquo congestion scenario
Philosophy
Transport Layer ndash TCP
45
BB
Refinement
Q when should the exponential increase switch to linear
A when cwndcwnd gets to 1212 of its value before timeout
Implementation variable ssthresh (slow-start threshold)ssthresh (slow-start threshold) on loss eventon loss event ssthreshssthresh is set to 1212 of cwndcwnd just
before loss event
Transport Layer ndash TCP
46
BB
TCP Sender Congestion ControlSTATE EVENT TCP SENDER Congestion-
Control ActionCommentary
SLOW START (SS)
ACK receipt for previously unACKed data
CongWin = CongWin + MSSIf(CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo
Resulting in a doubling of CongWin every RTT
Congestion Avoidance (CA)
ACK receipt for previously unACKed data
CongWin = CongWin + MSS (MSSCongWin)
Additive increase resulting in increasing of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate ACK
Threshold = CongWin 2CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
SS or CA Timeout Threshold = CongWin 2CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter Slow Start
SS or CA Duplicate ACK Increment duplicate ACK count for segment being ACKed
CongWin and Threshold not changed
Transport Layer ndash TCP
47
BB
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3 MSSretransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++
duplicate ACK
cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
Transport Layer ndash TCP
48
BB
Congestion control
TCPrsquos Congestion Control Service
CLIENTSERVER
Problem Gridlock sets-in when there is packet loss due to router congestion
forces the End Systems to decrease the rate at which packets are sentdecrease the rate at which packets are sent during periods of congestion
The sending systemrsquos packet is lost due to congestion and is alerted when it stops receiving ACKs of packets sent
Transport Layer ndash TCP
49
BBTransport Layer 3-49
Macroscopic Description of TCP throughput
whatrsquos the average throughout of TCP as a function of window size and RTT ignore slow start (typically very short phases)
let W be the window size when loss occurs when window is W throughput is WRTT just after loss window drops to W2 throughput to
W2RTT Throughput increases linearly (by MSSRTT every
RTT) Average Throughput 75 WRTT
(Based on Idealised model for the steady-state dynamics of TCP)
Transport Layer 3-50
TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired
throughput of 10 Gbps requires window size W = 83333 in-flight
segments Throughput in terms of loss rate
L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)
new versions of TCP is needed for high-speed environments
LRTT
MSS221
Transport Layer ndash TCP
51
BB
TCP FairnessTCP Fairness
Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth
TCP connection 1
bottleneckrouter
capacity RR
TCP connection 2
Go to Summary of TCP Congestion Control
Transport Layer ndash TCP
52
BB
Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link
Link with transmission rate of RR
Each connection have the same MSSMSS RTTRTT
No other TCP connections or UDP datagrams traverse the shared link
Ignore slow start phase of TCP
Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)
Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing
AssumptionsAssumptions
Transport Layer ndash TCP
53
BB
Why is TCP fairWhy is TCP fair
Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput
proportionally
RR
equal bandwidth shareequal bandwidth share
Connection 1 throughput
Connect
ion 2
thro
ughput
congestion avoidance additive increaseloss decrease window by factor of 2
RR
A point on the graph depicts the amount of link bandwidth
jointly jointly consumedconsumed by
the connections
Full bandwidth utilisation line
We can viewWe can view a simulation a simulation
on thison this
View SimulationView Simulation
Transport Layer ndash TCP
54
BB
The End
The next succeeding slides are just for additional reading
Transport Layer ndash TCP
55
BB
TCP Latency ModelingTCP Latency Modeling
In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs
Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth
Multiple End Systems sharing a link
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
Loop holes in TCPLoop holes in TCP
1 TCP connection
1 TCP connection
1 TCP connection
3 TCP connections
Multithreading implementationMultithreading implementation
Transport Layer ndash TCP
56
BB
TCP latency modelingTCP latency modeling
TCP connection establishment time data transfer delay Actual data transmission time
Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent
QQ How long How long does it take to does it take to
receive an object receive an object from a Web from a Web
serverserver
No data transfer delayNo data transfer delay
Therersquos data transfer delayTherersquos data transfer delay
the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety
Transport Layer ndash TCP
57
BB
TCP Latency ModelingTCP Latency Modeling
Network is uncongested with one link between end systems of rate RR
CongWinCongWin (fixed) determines the amount of data that can be sent
No packet loss no packet corruption no retransmissions required
Header overheads are negligible
File to sendFile to send = integer number of segments of size MSS
Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times
CLIENT SERVER
FILEFILE
Initial ThresholdThreshold of TCP congestion mechanism is very big
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits
SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)
Transport Layer ndash TCP
58
BB
TCP latency ModelingTCP latency Modeling
Case 1 latency = 2RTT + OR
K = Number of Windows of data that cover the object
K = OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
Number of segmentsRounded up to the nearest integer
Assume W=4 segmentsAssume W=4 segments
eg O=256bits S=32bits W=4
Transport Layer ndash TCP
59
BB
TCP latency ModelingTCP latency Modeling
Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]
Number of Windows of data that cover the objectK= OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR
Sender has to wait for an ACK after a windowrsquos worth of data sent
STALLED STALLED PERIODPERIOD
If there are k windows sender willbe stalled (k-1) times
Transport Layer ndash TCP
60
BB
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
STALLED STALLED PERIODPERIOD
4 windows4 windows
OS=15
Transport Layer ndash TCP
61
BB
bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the
object as followsobject as follows
S
OkK k 110 222min
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1log
1logmin
12min
2
2
S
OK
S
OkkK
S
OkK k
Note
Transport Layer ndash TCP
62
BB
bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window
bull Transmission of Transmission of kkth window window ==
bull Stall Time Stall Time ==
bull Latency Latency ==
12
k
R
S
12
k
R
SRTT
R
S
RTTR
S
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1K
1
122k
k
R
SRTT
R
S
R
ORTT
Transport Layer ndash TCP
63
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
Transport Layer ndash TCP
64
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
R
S
R
SRTTP
R
ORTT P )12(2Latency
bull Closed-form expression for the latency
Transport Layer ndash TCP
65
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
2
1
RTTRO
P
encyMinimumLat
Latency
Slow start will not significantly increase latency if RTT ltlt OR
Transport Layer ndash TCP
66
BB
httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features
Transport Layer ndash TCP
23
BB
Host A
Seq=100 20 bytes data
ACK=100Seq=
92
tim
eout
time
Host B
ACK=120
Seq=92 8 bytes data
Xloss
Cumulative ACK avoids retransmission of the first segment
TCP Retransmission ScenarioTCP Retransmission Scenario
Transport Layer ndash TCP
24
BB
TCP Modifications Doubling the Timeout IntervalDoubling the Timeout Interval
Provides a limited form of congestion control
Timer expiration is more likely caused by congestion in the network
TimeoutInterval = 2 TimeoutIntervalPrevious
TCP acts more politely by increasing the TimeoutInterval causing the sender to retransmit after longer and longer intervals
Congestion may get worse if sources
continue to retransmit packets
persistently
After ACK is received After ACK is received TimeoutInterval is derived TimeoutInterval is derived from most recent EstimatedRTT from most recent EstimatedRTT and DevRTTand DevRTT
Others check RFC 2018 ndash selective ACK
Transport Layer ndash TCP
25
BB
receiverreceiver explicitly informs sender of (dynamically changing) amount of free buffer space RcvWindowRcvWindow field
in TCP segment
sendersender keeps the amount of transmitted unACKed data less thanless than most recently received RcvWindowRcvWindow
sender wont overrun
receivers buffer bytransmitting too
much too fast
flow controlflow control
receiver buffering
RcvBufferRcvBuffer = size of TCP Receive Buffer
RcvWindowRcvWindow = amount of spare room in Buffer
TCP Flow ControlTCP Flow Control
Transport Layer ndash TCP
26
BB
FLOW CONTROL FLOW CONTROL ReceiverReceiver
04060 50100
LastByteRcvd
EXAMPLE HOST A sends a large file to HOST B
LastByteRead
Application Process
Data from IP
RcvBuffer
RcvWindow=RcvBuffer-[LastByteRcvd-LastByteRead]
RECEIVER HOST B ndash uses RcvWindow LastByteRcvd LastByteReadRcvWindow LastByteRcvd LastByteRead
Initially RcvWindow = RcvBuffer Application reads from the bufferHOST BHOST B tells HOST AHOST A how much spare roomhow much spare room it has in the connection buffer by placing its current value of RcvWindowRcvWindow in the receive window field of every segment it sends to HOST AHOST A
Transport Layer ndash TCP
27
BB
FLOW CONTROL SenderSender
04060 50100
LastByteSent
EXAMPLE HOST A sends a large file to HOST B
LastByteACKed
Data
To ensure that HOST B does not overflow HOST A maintains throughout the connectionrsquos life that [LastByteSent-LastByteACKed] lt= RcvWindow
SENDER HOST A
ACKs ACKs from from Host Host BB
SENDER HOST A ndash uses RcvWindow RcvWindow of HostBof HostB LastByteSent LastByteACKed LastByteSent LastByteACKed
Transport Layer ndash TCP
28
BB
FLOW CONTROLFLOW CONTROLSome issue to consider
RcvWindowRcvWindow ndash used by the connection to provide the flow control serviceflow control service
TCPTCP requires that requires that HOST AHOST A continue to sendcontinue to send segments with one segments with one data byte when HOST Brsquos receive data byte when HOST Brsquos receive window is window is 00 Such segments will be Such segments will be ACKedACKed by HOST B by HOST B Eventually the buffer will have Eventually the buffer will have some space and the some space and the ACKsACKs will will contain contain RcvWindow gt 0RcvWindow gt 0
What happens when the receive
buffer of HOST B is full full (that is when (that is when
RcvWindow=0)RcvWindow=0)
TCP sends a segment only when there is datadata or ACKACK to send Therefore the sender must maintain the connection lsquoalivealiversquo
Transport Layer ndash TCP
29
BB
TCP Connection ManagementTCP Connection Management
Recall TCP sender receiver establish ldquoconnectionconnectionrdquo before exchanging data segments
Initialize TCP variables sequence numbers buffers flow control info (eg RcvWindow)
ClientClient is the connection initiator
In Java Socket clientSocket = new Socket(hostnameport number) connect ServerServer is contacted by client
In JavaSocket accept()
if (connectconnect(s (struct sockaddr )ampsin sizeof(sin)) = 0) printf(connect failedn) WSACleanup() exit(1)
ns = acceptaccept(s(struct sockaddr )(ampremoteaddr)ampaddrlen)
Transport Layer ndash TCP
30
BB
TCP Connection Management
Three way handshakeStep 1 clientclient end system sends TCP
SYNSYN control segment to server (executed by TCP itself)
specifies initial seq number (isn)
Step 2 serverserver end system receives SYNSYN replies with SYNACKSYNACK control segment
ACKs received SYN allocates buffers specifies serverrsquos initial seq
number
Step 3 clientclient ACKsACKs the connection with
ACK=server_isn +1ACK=server_isn +1
allocates buffers sends SYN=0SYN=0
Connection established
Client
AcceptAccept (SYN=1
seq=server_isnack=client_isn+1)
time
Server
ConnectConnect (SYN=1 seq=client_isn)
ACK (SYN=0
seq=client_isn+1ack=server_isn+1)
Establishing a connection
This is what happens when we create a socket for
connection to a server
After establishing the connection the client can receive segments with app-generated data (SYN=0)(SYN=0)
Transport Layer ndash TCP
31
BB
TCP Connection Management (cont)
Closing a connection
client closes socket
closesocket(s)closesocket(s)
Java clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
How TCP connection is established and torn down
Transport Layer ndash TCP
32
BB
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
Enters timed wait - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer ndash TCP
33
BB
TCP Connection Management (cont)
TCP client lifecycle
TCP server lifecycle
Used in case ACK gets lost It is implementation-dependent (eg 30
seconds 1 minute 2 minutes
Connection formally closes ndash all resources (eg port numbers) are
released
1
2
3
4
5
6
7
8
9
10
11
12
Transport Layer ndash TCP
34
BB
End of Flow Control and Error Control
Transport Layer ndash TCP
35
BB
Flow Control vs Congestion ControlFlow Control vs Congestion ControlSimilar actions are taken but for very different reasons
Flow ControlFlow Controlbull point-to-point traffic between sender and receiverbull speed matching service matching the rate at which the sender is sending against the rate at which the receiving application is readingbull prevents Receiver Buffer from overflowing
CongestionCongestion ndash happens when there are too many sources attempting to send data at too high a rate for the routers along the path
Congestion ControlCongestion Controlbull service that makes sure that the routers between End Systems are able to carry the offered trafficbull prevents routers from overflowing
Same course of action Throttling of the sender
Transport Layer ndash TCP
36
BB
CongestionCongestion Informally too many sources sending too
much data too fast for network to handle different from flow control Manifestations
lost packets (buffer overflow at routers) long delays (queuing in router buffers)
a top-10 problem
Principles of Congestion ControlPrinciples of Congestion Control
Transport Layer ndash TCP
37
BB
Approaches towards congestion control
End-to-end congestion control
no explicit feedback from network
congestion inferred by end-systems from observed packet loss amp delay
approach taken by TCP
Network-assisted congestion control
routers provide feedback to End Systems in the form of single bit indicating
link congestion (SNA DECbit TCPIP ECN ATM ABR)
explicit transmission rate the sender should send at
1 2
Two broad approaches towards congestion control
Transport Layer ndash TCP
38
BB
TCP Congestion ControlTCP Congestion ControlHow TCP sender limits the rate at which it sends traffic into its connection
SENDER
(Amount of unACKed data)SENDER lt min(CongWin RcvWindow)
LastByteSent - LastByteACKed
Indirectly limits the senderrsquos send rateAssumptions
bull TCP receive buffer is very large ndash no RcvWindow constraint Amt of unACKed data at sender is solely limited by CongWin
bull Packet loss delay amp packet transmission delay are negligible
Sending rate (approx) CongWinRTT
By adjusting CongWin sender can therefore adjust the
rate at which it sends data into its connection
New variable ndash Congestion
Window
Transport Layer ndash TCP
39
BB
TCP Congestion ControlTCP Congestion ControlTCP uses ACKs to trigger (ldquoclockrdquo) its increase in congestion window size ndash ldquoself-clockingrdquo
Arrival of ACKs ndash indication to the sender that all is well
1 Slow Rate
bull Congestion window will be increased at a relatively slow rate
2 High rate
bull Congestion window will be increased more quickly
Transport Layer ndash TCP
40
BB
TCP Congestion ControlTCP Congestion ControlHow TCP perceives that there is congestion on the path
ldquoLoss Eventrdquo ndash when there is excessive congestion router buffers along the path overflows causing datagrams to be dropped which in turn results in a ldquoloss eventrdquo at the sender
1 Timeout
bull no ACK is received after segment loss
2 Receipt of three duplicate ACKs
bull segment loss is followed by three ACKs received at the sender
Transport Layer ndash TCP
41
BB
TCP Congestion Control details
sender limits transmission LastByteSent-LastByteAcked
cwnd
roughly
cwnd is dynamic function of perceived network congestion
How does sender perceive congestion
loss event = timeout or 3 duplicate acks
TCP sender reduces rate (cwnd) after loss event
Three mechanisms1 AIMD
2 slow start
3 conservative after timeout events
rate = cwnd
RTT Bytessec
Transport Layer ndash TCP
42
BB
TCP congestion avoidance congestion avoidance additive increase multiplicative decreaseadditive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every RTT until
loss is detected multiplicative decrease cut cwnd in half after loss
timecwnd
co
nge
stio
n w
indo
w s
ize
saw toothbehavior probingfor bandwidth
Transport Layer ndash TCP
43
BB
TCP Slow Start Slow Start when connection begins
increase rate exponentially until first loss event initially cwndcwnd = 1 MSS double cwndcwnd every RTT done by incrementing cwndcwnd by 1 MSS by 1 MSS for every ACK received
summary initial rate is slow but ramps up exponentially fast (doubling of the sending rate every RTT)
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transport Layer ndash TCP
44
BB
Refinement inferring lossRefinement inferring loss
after 3 dup ACKsafter 3 dup ACKs cwndcwnd is cut in halfhalf window then grows
linearlylinearly butbut after timeout after timeout
eventevent cwndcwnd is is set to 1 MSS1 MSS window then grows
exponentiallyexponentially Up to a thresholdUp to a threshold
then grows linearlylinearly
3 dup ACKs 3 dup ACKs indicates network capable of delivering some segments
timeouttimeout indicates a ldquomore alarmingrdquo congestion scenario
Philosophy
Transport Layer ndash TCP
45
BB
Refinement
Q when should the exponential increase switch to linear
A when cwndcwnd gets to 1212 of its value before timeout
Implementation variable ssthresh (slow-start threshold)ssthresh (slow-start threshold) on loss eventon loss event ssthreshssthresh is set to 1212 of cwndcwnd just
before loss event
Transport Layer ndash TCP
46
BB
TCP Sender Congestion ControlSTATE EVENT TCP SENDER Congestion-
Control ActionCommentary
SLOW START (SS)
ACK receipt for previously unACKed data
CongWin = CongWin + MSSIf(CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo
Resulting in a doubling of CongWin every RTT
Congestion Avoidance (CA)
ACK receipt for previously unACKed data
CongWin = CongWin + MSS (MSSCongWin)
Additive increase resulting in increasing of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate ACK
Threshold = CongWin 2CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
SS or CA Timeout Threshold = CongWin 2CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter Slow Start
SS or CA Duplicate ACK Increment duplicate ACK count for segment being ACKed
CongWin and Threshold not changed
Transport Layer ndash TCP
47
BB
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3 MSSretransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++
duplicate ACK
cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
Transport Layer ndash TCP
48
BB
Congestion control
TCPrsquos Congestion Control Service
CLIENTSERVER
Problem Gridlock sets-in when there is packet loss due to router congestion
forces the End Systems to decrease the rate at which packets are sentdecrease the rate at which packets are sent during periods of congestion
The sending systemrsquos packet is lost due to congestion and is alerted when it stops receiving ACKs of packets sent
Transport Layer ndash TCP
49
BBTransport Layer 3-49
Macroscopic Description of TCP throughput
whatrsquos the average throughout of TCP as a function of window size and RTT ignore slow start (typically very short phases)
let W be the window size when loss occurs when window is W throughput is WRTT just after loss window drops to W2 throughput to
W2RTT Throughput increases linearly (by MSSRTT every
RTT) Average Throughput 75 WRTT
(Based on Idealised model for the steady-state dynamics of TCP)
Transport Layer 3-50
TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired
throughput of 10 Gbps requires window size W = 83333 in-flight
segments Throughput in terms of loss rate
L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)
new versions of TCP is needed for high-speed environments
LRTT
MSS221
Transport Layer ndash TCP
51
BB
TCP FairnessTCP Fairness
Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth
TCP connection 1
bottleneckrouter
capacity RR
TCP connection 2
Go to Summary of TCP Congestion Control
Transport Layer ndash TCP
52
BB
Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link
Link with transmission rate of RR
Each connection have the same MSSMSS RTTRTT
No other TCP connections or UDP datagrams traverse the shared link
Ignore slow start phase of TCP
Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)
Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing
AssumptionsAssumptions
Transport Layer ndash TCP
53
BB
Why is TCP fairWhy is TCP fair
Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput
proportionally
RR
equal bandwidth shareequal bandwidth share
Connection 1 throughput
Connect
ion 2
thro
ughput
congestion avoidance additive increaseloss decrease window by factor of 2
RR
A point on the graph depicts the amount of link bandwidth
jointly jointly consumedconsumed by
the connections
Full bandwidth utilisation line
We can viewWe can view a simulation a simulation
on thison this
View SimulationView Simulation
Transport Layer ndash TCP
54
BB
The End
The next succeeding slides are just for additional reading
Transport Layer ndash TCP
55
BB
TCP Latency ModelingTCP Latency Modeling
In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs
Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth
Multiple End Systems sharing a link
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
Loop holes in TCPLoop holes in TCP
1 TCP connection
1 TCP connection
1 TCP connection
3 TCP connections
Multithreading implementationMultithreading implementation
Transport Layer ndash TCP
56
BB
TCP latency modelingTCP latency modeling
TCP connection establishment time data transfer delay Actual data transmission time
Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent
QQ How long How long does it take to does it take to
receive an object receive an object from a Web from a Web
serverserver
No data transfer delayNo data transfer delay
Therersquos data transfer delayTherersquos data transfer delay
the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety
Transport Layer ndash TCP
57
BB
TCP Latency ModelingTCP Latency Modeling
Network is uncongested with one link between end systems of rate RR
CongWinCongWin (fixed) determines the amount of data that can be sent
No packet loss no packet corruption no retransmissions required
Header overheads are negligible
File to sendFile to send = integer number of segments of size MSS
Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times
CLIENT SERVER
FILEFILE
Initial ThresholdThreshold of TCP congestion mechanism is very big
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits
SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)
Transport Layer ndash TCP
58
BB
TCP latency ModelingTCP latency Modeling
Case 1 latency = 2RTT + OR
K = Number of Windows of data that cover the object
K = OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
Number of segmentsRounded up to the nearest integer
Assume W=4 segmentsAssume W=4 segments
eg O=256bits S=32bits W=4
Transport Layer ndash TCP
59
BB
TCP latency ModelingTCP latency Modeling
Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]
Number of Windows of data that cover the objectK= OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR
Sender has to wait for an ACK after a windowrsquos worth of data sent
STALLED STALLED PERIODPERIOD
If there are k windows sender willbe stalled (k-1) times
Transport Layer ndash TCP
60
BB
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
STALLED STALLED PERIODPERIOD
4 windows4 windows
OS=15
Transport Layer ndash TCP
61
BB
bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the
object as followsobject as follows
S
OkK k 110 222min
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1log
1logmin
12min
2
2
S
OK
S
OkkK
S
OkK k
Note
Transport Layer ndash TCP
62
BB
bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window
bull Transmission of Transmission of kkth window window ==
bull Stall Time Stall Time ==
bull Latency Latency ==
12
k
R
S
12
k
R
SRTT
R
S
RTTR
S
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1K
1
122k
k
R
SRTT
R
S
R
ORTT
Transport Layer ndash TCP
63
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
Transport Layer ndash TCP
64
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
R
S
R
SRTTP
R
ORTT P )12(2Latency
bull Closed-form expression for the latency
Transport Layer ndash TCP
65
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
2
1
RTTRO
P
encyMinimumLat
Latency
Slow start will not significantly increase latency if RTT ltlt OR
Transport Layer ndash TCP
66
BB
httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features
Transport Layer ndash TCP
24
BB
TCP Modifications Doubling the Timeout IntervalDoubling the Timeout Interval
Provides a limited form of congestion control
Timer expiration is more likely caused by congestion in the network
TimeoutInterval = 2 TimeoutIntervalPrevious
TCP acts more politely by increasing the TimeoutInterval causing the sender to retransmit after longer and longer intervals
Congestion may get worse if sources
continue to retransmit packets
persistently
After ACK is received After ACK is received TimeoutInterval is derived TimeoutInterval is derived from most recent EstimatedRTT from most recent EstimatedRTT and DevRTTand DevRTT
Others check RFC 2018 ndash selective ACK
Transport Layer ndash TCP
25
BB
receiverreceiver explicitly informs sender of (dynamically changing) amount of free buffer space RcvWindowRcvWindow field
in TCP segment
sendersender keeps the amount of transmitted unACKed data less thanless than most recently received RcvWindowRcvWindow
sender wont overrun
receivers buffer bytransmitting too
much too fast
flow controlflow control
receiver buffering
RcvBufferRcvBuffer = size of TCP Receive Buffer
RcvWindowRcvWindow = amount of spare room in Buffer
TCP Flow ControlTCP Flow Control
Transport Layer ndash TCP
26
BB
FLOW CONTROL FLOW CONTROL ReceiverReceiver
04060 50100
LastByteRcvd
EXAMPLE HOST A sends a large file to HOST B
LastByteRead
Application Process
Data from IP
RcvBuffer
RcvWindow=RcvBuffer-[LastByteRcvd-LastByteRead]
RECEIVER HOST B ndash uses RcvWindow LastByteRcvd LastByteReadRcvWindow LastByteRcvd LastByteRead
Initially RcvWindow = RcvBuffer Application reads from the bufferHOST BHOST B tells HOST AHOST A how much spare roomhow much spare room it has in the connection buffer by placing its current value of RcvWindowRcvWindow in the receive window field of every segment it sends to HOST AHOST A
Transport Layer ndash TCP
27
BB
FLOW CONTROL SenderSender
04060 50100
LastByteSent
EXAMPLE HOST A sends a large file to HOST B
LastByteACKed
Data
To ensure that HOST B does not overflow HOST A maintains throughout the connectionrsquos life that [LastByteSent-LastByteACKed] lt= RcvWindow
SENDER HOST A
ACKs ACKs from from Host Host BB
SENDER HOST A ndash uses RcvWindow RcvWindow of HostBof HostB LastByteSent LastByteACKed LastByteSent LastByteACKed
Transport Layer ndash TCP
28
BB
FLOW CONTROLFLOW CONTROLSome issue to consider
RcvWindowRcvWindow ndash used by the connection to provide the flow control serviceflow control service
TCPTCP requires that requires that HOST AHOST A continue to sendcontinue to send segments with one segments with one data byte when HOST Brsquos receive data byte when HOST Brsquos receive window is window is 00 Such segments will be Such segments will be ACKedACKed by HOST B by HOST B Eventually the buffer will have Eventually the buffer will have some space and the some space and the ACKsACKs will will contain contain RcvWindow gt 0RcvWindow gt 0
What happens when the receive
buffer of HOST B is full full (that is when (that is when
RcvWindow=0)RcvWindow=0)
TCP sends a segment only when there is datadata or ACKACK to send Therefore the sender must maintain the connection lsquoalivealiversquo
Transport Layer ndash TCP
29
BB
TCP Connection ManagementTCP Connection Management
Recall TCP sender receiver establish ldquoconnectionconnectionrdquo before exchanging data segments
Initialize TCP variables sequence numbers buffers flow control info (eg RcvWindow)
ClientClient is the connection initiator
In Java Socket clientSocket = new Socket(hostnameport number) connect ServerServer is contacted by client
In JavaSocket accept()
if (connectconnect(s (struct sockaddr )ampsin sizeof(sin)) = 0) printf(connect failedn) WSACleanup() exit(1)
ns = acceptaccept(s(struct sockaddr )(ampremoteaddr)ampaddrlen)
Transport Layer ndash TCP
30
BB
TCP Connection Management
Three way handshakeStep 1 clientclient end system sends TCP
SYNSYN control segment to server (executed by TCP itself)
specifies initial seq number (isn)
Step 2 serverserver end system receives SYNSYN replies with SYNACKSYNACK control segment
ACKs received SYN allocates buffers specifies serverrsquos initial seq
number
Step 3 clientclient ACKsACKs the connection with
ACK=server_isn +1ACK=server_isn +1
allocates buffers sends SYN=0SYN=0
Connection established
Client
AcceptAccept (SYN=1
seq=server_isnack=client_isn+1)
time
Server
ConnectConnect (SYN=1 seq=client_isn)
ACK (SYN=0
seq=client_isn+1ack=server_isn+1)
Establishing a connection
This is what happens when we create a socket for
connection to a server
After establishing the connection the client can receive segments with app-generated data (SYN=0)(SYN=0)
Transport Layer ndash TCP
31
BB
TCP Connection Management (cont)
Closing a connection
client closes socket
closesocket(s)closesocket(s)
Java clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
How TCP connection is established and torn down
Transport Layer ndash TCP
32
BB
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
Enters timed wait - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer ndash TCP
33
BB
TCP Connection Management (cont)
TCP client lifecycle
TCP server lifecycle
Used in case ACK gets lost It is implementation-dependent (eg 30
seconds 1 minute 2 minutes
Connection formally closes ndash all resources (eg port numbers) are
released
1
2
3
4
5
6
7
8
9
10
11
12
Transport Layer ndash TCP
34
BB
End of Flow Control and Error Control
Transport Layer ndash TCP
35
BB
Flow Control vs Congestion ControlFlow Control vs Congestion ControlSimilar actions are taken but for very different reasons
Flow ControlFlow Controlbull point-to-point traffic between sender and receiverbull speed matching service matching the rate at which the sender is sending against the rate at which the receiving application is readingbull prevents Receiver Buffer from overflowing
CongestionCongestion ndash happens when there are too many sources attempting to send data at too high a rate for the routers along the path
Congestion ControlCongestion Controlbull service that makes sure that the routers between End Systems are able to carry the offered trafficbull prevents routers from overflowing
Same course of action Throttling of the sender
Transport Layer ndash TCP
36
BB
CongestionCongestion Informally too many sources sending too
much data too fast for network to handle different from flow control Manifestations
lost packets (buffer overflow at routers) long delays (queuing in router buffers)
a top-10 problem
Principles of Congestion ControlPrinciples of Congestion Control
Transport Layer ndash TCP
37
BB
Approaches towards congestion control
End-to-end congestion control
no explicit feedback from network
congestion inferred by end-systems from observed packet loss amp delay
approach taken by TCP
Network-assisted congestion control
routers provide feedback to End Systems in the form of single bit indicating
link congestion (SNA DECbit TCPIP ECN ATM ABR)
explicit transmission rate the sender should send at
1 2
Two broad approaches towards congestion control
Transport Layer ndash TCP
38
BB
TCP Congestion ControlTCP Congestion ControlHow TCP sender limits the rate at which it sends traffic into its connection
SENDER
(Amount of unACKed data)SENDER lt min(CongWin RcvWindow)
LastByteSent - LastByteACKed
Indirectly limits the senderrsquos send rateAssumptions
bull TCP receive buffer is very large ndash no RcvWindow constraint Amt of unACKed data at sender is solely limited by CongWin
bull Packet loss delay amp packet transmission delay are negligible
Sending rate (approx) CongWinRTT
By adjusting CongWin sender can therefore adjust the
rate at which it sends data into its connection
New variable ndash Congestion
Window
Transport Layer ndash TCP
39
BB
TCP Congestion ControlTCP Congestion ControlTCP uses ACKs to trigger (ldquoclockrdquo) its increase in congestion window size ndash ldquoself-clockingrdquo
Arrival of ACKs ndash indication to the sender that all is well
1 Slow Rate
bull Congestion window will be increased at a relatively slow rate
2 High rate
bull Congestion window will be increased more quickly
Transport Layer ndash TCP
40
BB
TCP Congestion ControlTCP Congestion ControlHow TCP perceives that there is congestion on the path
ldquoLoss Eventrdquo ndash when there is excessive congestion router buffers along the path overflows causing datagrams to be dropped which in turn results in a ldquoloss eventrdquo at the sender
1 Timeout
bull no ACK is received after segment loss
2 Receipt of three duplicate ACKs
bull segment loss is followed by three ACKs received at the sender
Transport Layer ndash TCP
41
BB
TCP Congestion Control details
sender limits transmission LastByteSent-LastByteAcked
cwnd
roughly
cwnd is dynamic function of perceived network congestion
How does sender perceive congestion
loss event = timeout or 3 duplicate acks
TCP sender reduces rate (cwnd) after loss event
Three mechanisms1 AIMD
2 slow start
3 conservative after timeout events
rate = cwnd
RTT Bytessec
Transport Layer ndash TCP
42
BB
TCP congestion avoidance congestion avoidance additive increase multiplicative decreaseadditive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every RTT until
loss is detected multiplicative decrease cut cwnd in half after loss
timecwnd
co
nge
stio
n w
indo
w s
ize
saw toothbehavior probingfor bandwidth
Transport Layer ndash TCP
43
BB
TCP Slow Start Slow Start when connection begins
increase rate exponentially until first loss event initially cwndcwnd = 1 MSS double cwndcwnd every RTT done by incrementing cwndcwnd by 1 MSS by 1 MSS for every ACK received
summary initial rate is slow but ramps up exponentially fast (doubling of the sending rate every RTT)
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transport Layer ndash TCP
44
BB
Refinement inferring lossRefinement inferring loss
after 3 dup ACKsafter 3 dup ACKs cwndcwnd is cut in halfhalf window then grows
linearlylinearly butbut after timeout after timeout
eventevent cwndcwnd is is set to 1 MSS1 MSS window then grows
exponentiallyexponentially Up to a thresholdUp to a threshold
then grows linearlylinearly
3 dup ACKs 3 dup ACKs indicates network capable of delivering some segments
timeouttimeout indicates a ldquomore alarmingrdquo congestion scenario
Philosophy
Transport Layer ndash TCP
45
BB
Refinement
Q when should the exponential increase switch to linear
A when cwndcwnd gets to 1212 of its value before timeout
Implementation variable ssthresh (slow-start threshold)ssthresh (slow-start threshold) on loss eventon loss event ssthreshssthresh is set to 1212 of cwndcwnd just
before loss event
Transport Layer ndash TCP
46
BB
TCP Sender Congestion ControlSTATE EVENT TCP SENDER Congestion-
Control ActionCommentary
SLOW START (SS)
ACK receipt for previously unACKed data
CongWin = CongWin + MSSIf(CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo
Resulting in a doubling of CongWin every RTT
Congestion Avoidance (CA)
ACK receipt for previously unACKed data
CongWin = CongWin + MSS (MSSCongWin)
Additive increase resulting in increasing of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate ACK
Threshold = CongWin 2CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
SS or CA Timeout Threshold = CongWin 2CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter Slow Start
SS or CA Duplicate ACK Increment duplicate ACK count for segment being ACKed
CongWin and Threshold not changed
Transport Layer ndash TCP
47
BB
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3 MSSretransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++
duplicate ACK
cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
Transport Layer ndash TCP
48
BB
Congestion control
TCPrsquos Congestion Control Service
CLIENTSERVER
Problem Gridlock sets-in when there is packet loss due to router congestion
forces the End Systems to decrease the rate at which packets are sentdecrease the rate at which packets are sent during periods of congestion
The sending systemrsquos packet is lost due to congestion and is alerted when it stops receiving ACKs of packets sent
Transport Layer ndash TCP
49
BBTransport Layer 3-49
Macroscopic Description of TCP throughput
whatrsquos the average throughout of TCP as a function of window size and RTT ignore slow start (typically very short phases)
let W be the window size when loss occurs when window is W throughput is WRTT just after loss window drops to W2 throughput to
W2RTT Throughput increases linearly (by MSSRTT every
RTT) Average Throughput 75 WRTT
(Based on Idealised model for the steady-state dynamics of TCP)
Transport Layer 3-50
TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired
throughput of 10 Gbps requires window size W = 83333 in-flight
segments Throughput in terms of loss rate
L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)
new versions of TCP is needed for high-speed environments
LRTT
MSS221
Transport Layer ndash TCP
51
BB
TCP FairnessTCP Fairness
Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth
TCP connection 1
bottleneckrouter
capacity RR
TCP connection 2
Go to Summary of TCP Congestion Control
Transport Layer ndash TCP
52
BB
Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link
Link with transmission rate of RR
Each connection have the same MSSMSS RTTRTT
No other TCP connections or UDP datagrams traverse the shared link
Ignore slow start phase of TCP
Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)
Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing
AssumptionsAssumptions
Transport Layer ndash TCP
53
BB
Why is TCP fairWhy is TCP fair
Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput
proportionally
RR
equal bandwidth shareequal bandwidth share
Connection 1 throughput
Connect
ion 2
thro
ughput
congestion avoidance additive increaseloss decrease window by factor of 2
RR
A point on the graph depicts the amount of link bandwidth
jointly jointly consumedconsumed by
the connections
Full bandwidth utilisation line
We can viewWe can view a simulation a simulation
on thison this
View SimulationView Simulation
Transport Layer ndash TCP
54
BB
The End
The next succeeding slides are just for additional reading
Transport Layer ndash TCP
55
BB
TCP Latency ModelingTCP Latency Modeling
In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs
Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth
Multiple End Systems sharing a link
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
Loop holes in TCPLoop holes in TCP
1 TCP connection
1 TCP connection
1 TCP connection
3 TCP connections
Multithreading implementationMultithreading implementation
Transport Layer ndash TCP
56
BB
TCP latency modelingTCP latency modeling
TCP connection establishment time data transfer delay Actual data transmission time
Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent
QQ How long How long does it take to does it take to
receive an object receive an object from a Web from a Web
serverserver
No data transfer delayNo data transfer delay
Therersquos data transfer delayTherersquos data transfer delay
the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety
Transport Layer ndash TCP
57
BB
TCP Latency ModelingTCP Latency Modeling
Network is uncongested with one link between end systems of rate RR
CongWinCongWin (fixed) determines the amount of data that can be sent
No packet loss no packet corruption no retransmissions required
Header overheads are negligible
File to sendFile to send = integer number of segments of size MSS
Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times
CLIENT SERVER
FILEFILE
Initial ThresholdThreshold of TCP congestion mechanism is very big
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits
SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)
Transport Layer ndash TCP
58
BB
TCP latency ModelingTCP latency Modeling
Case 1 latency = 2RTT + OR
K = Number of Windows of data that cover the object
K = OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
Number of segmentsRounded up to the nearest integer
Assume W=4 segmentsAssume W=4 segments
eg O=256bits S=32bits W=4
Transport Layer ndash TCP
59
BB
TCP latency ModelingTCP latency Modeling
Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]
Number of Windows of data that cover the objectK= OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR
Sender has to wait for an ACK after a windowrsquos worth of data sent
STALLED STALLED PERIODPERIOD
If there are k windows sender willbe stalled (k-1) times
Transport Layer ndash TCP
60
BB
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
STALLED STALLED PERIODPERIOD
4 windows4 windows
OS=15
Transport Layer ndash TCP
61
BB
bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the
object as followsobject as follows
S
OkK k 110 222min
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1log
1logmin
12min
2
2
S
OK
S
OkkK
S
OkK k
Note
Transport Layer ndash TCP
62
BB
bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window
bull Transmission of Transmission of kkth window window ==
bull Stall Time Stall Time ==
bull Latency Latency ==
12
k
R
S
12
k
R
SRTT
R
S
RTTR
S
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1K
1
122k
k
R
SRTT
R
S
R
ORTT
Transport Layer ndash TCP
63
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
Transport Layer ndash TCP
64
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
R
S
R
SRTTP
R
ORTT P )12(2Latency
bull Closed-form expression for the latency
Transport Layer ndash TCP
65
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
2
1
RTTRO
P
encyMinimumLat
Latency
Slow start will not significantly increase latency if RTT ltlt OR
Transport Layer ndash TCP
66
BB
httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features
Transport Layer ndash TCP
25
BB
receiverreceiver explicitly informs sender of (dynamically changing) amount of free buffer space RcvWindowRcvWindow field
in TCP segment
sendersender keeps the amount of transmitted unACKed data less thanless than most recently received RcvWindowRcvWindow
sender wont overrun
receivers buffer bytransmitting too
much too fast
flow controlflow control
receiver buffering
RcvBufferRcvBuffer = size of TCP Receive Buffer
RcvWindowRcvWindow = amount of spare room in Buffer
TCP Flow ControlTCP Flow Control
Transport Layer ndash TCP
26
BB
FLOW CONTROL FLOW CONTROL ReceiverReceiver
04060 50100
LastByteRcvd
EXAMPLE HOST A sends a large file to HOST B
LastByteRead
Application Process
Data from IP
RcvBuffer
RcvWindow=RcvBuffer-[LastByteRcvd-LastByteRead]
RECEIVER HOST B ndash uses RcvWindow LastByteRcvd LastByteReadRcvWindow LastByteRcvd LastByteRead
Initially RcvWindow = RcvBuffer Application reads from the bufferHOST BHOST B tells HOST AHOST A how much spare roomhow much spare room it has in the connection buffer by placing its current value of RcvWindowRcvWindow in the receive window field of every segment it sends to HOST AHOST A
Transport Layer ndash TCP
27
BB
FLOW CONTROL SenderSender
04060 50100
LastByteSent
EXAMPLE HOST A sends a large file to HOST B
LastByteACKed
Data
To ensure that HOST B does not overflow HOST A maintains throughout the connectionrsquos life that [LastByteSent-LastByteACKed] lt= RcvWindow
SENDER HOST A
ACKs ACKs from from Host Host BB
SENDER HOST A ndash uses RcvWindow RcvWindow of HostBof HostB LastByteSent LastByteACKed LastByteSent LastByteACKed
Transport Layer ndash TCP
28
BB
FLOW CONTROLFLOW CONTROLSome issue to consider
RcvWindowRcvWindow ndash used by the connection to provide the flow control serviceflow control service
TCPTCP requires that requires that HOST AHOST A continue to sendcontinue to send segments with one segments with one data byte when HOST Brsquos receive data byte when HOST Brsquos receive window is window is 00 Such segments will be Such segments will be ACKedACKed by HOST B by HOST B Eventually the buffer will have Eventually the buffer will have some space and the some space and the ACKsACKs will will contain contain RcvWindow gt 0RcvWindow gt 0
What happens when the receive
buffer of HOST B is full full (that is when (that is when
RcvWindow=0)RcvWindow=0)
TCP sends a segment only when there is datadata or ACKACK to send Therefore the sender must maintain the connection lsquoalivealiversquo
Transport Layer ndash TCP
29
BB
TCP Connection ManagementTCP Connection Management
Recall TCP sender receiver establish ldquoconnectionconnectionrdquo before exchanging data segments
Initialize TCP variables sequence numbers buffers flow control info (eg RcvWindow)
ClientClient is the connection initiator
In Java Socket clientSocket = new Socket(hostnameport number) connect ServerServer is contacted by client
In JavaSocket accept()
if (connectconnect(s (struct sockaddr )ampsin sizeof(sin)) = 0) printf(connect failedn) WSACleanup() exit(1)
ns = acceptaccept(s(struct sockaddr )(ampremoteaddr)ampaddrlen)
Transport Layer ndash TCP
30
BB
TCP Connection Management
Three way handshakeStep 1 clientclient end system sends TCP
SYNSYN control segment to server (executed by TCP itself)
specifies initial seq number (isn)
Step 2 serverserver end system receives SYNSYN replies with SYNACKSYNACK control segment
ACKs received SYN allocates buffers specifies serverrsquos initial seq
number
Step 3 clientclient ACKsACKs the connection with
ACK=server_isn +1ACK=server_isn +1
allocates buffers sends SYN=0SYN=0
Connection established
Client
AcceptAccept (SYN=1
seq=server_isnack=client_isn+1)
time
Server
ConnectConnect (SYN=1 seq=client_isn)
ACK (SYN=0
seq=client_isn+1ack=server_isn+1)
Establishing a connection
This is what happens when we create a socket for
connection to a server
After establishing the connection the client can receive segments with app-generated data (SYN=0)(SYN=0)
Transport Layer ndash TCP
31
BB
TCP Connection Management (cont)
Closing a connection
client closes socket
closesocket(s)closesocket(s)
Java clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
How TCP connection is established and torn down
Transport Layer ndash TCP
32
BB
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
Enters timed wait - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer ndash TCP
33
BB
TCP Connection Management (cont)
TCP client lifecycle
TCP server lifecycle
Used in case ACK gets lost It is implementation-dependent (eg 30
seconds 1 minute 2 minutes
Connection formally closes ndash all resources (eg port numbers) are
released
1
2
3
4
5
6
7
8
9
10
11
12
Transport Layer ndash TCP
34
BB
End of Flow Control and Error Control
Transport Layer ndash TCP
35
BB
Flow Control vs Congestion ControlFlow Control vs Congestion ControlSimilar actions are taken but for very different reasons
Flow ControlFlow Controlbull point-to-point traffic between sender and receiverbull speed matching service matching the rate at which the sender is sending against the rate at which the receiving application is readingbull prevents Receiver Buffer from overflowing
CongestionCongestion ndash happens when there are too many sources attempting to send data at too high a rate for the routers along the path
Congestion ControlCongestion Controlbull service that makes sure that the routers between End Systems are able to carry the offered trafficbull prevents routers from overflowing
Same course of action Throttling of the sender
Transport Layer ndash TCP
36
BB
CongestionCongestion Informally too many sources sending too
much data too fast for network to handle different from flow control Manifestations
lost packets (buffer overflow at routers) long delays (queuing in router buffers)
a top-10 problem
Principles of Congestion ControlPrinciples of Congestion Control
Transport Layer ndash TCP
37
BB
Approaches towards congestion control
End-to-end congestion control
no explicit feedback from network
congestion inferred by end-systems from observed packet loss amp delay
approach taken by TCP
Network-assisted congestion control
routers provide feedback to End Systems in the form of single bit indicating
link congestion (SNA DECbit TCPIP ECN ATM ABR)
explicit transmission rate the sender should send at
1 2
Two broad approaches towards congestion control
Transport Layer ndash TCP
38
BB
TCP Congestion ControlTCP Congestion ControlHow TCP sender limits the rate at which it sends traffic into its connection
SENDER
(Amount of unACKed data)SENDER lt min(CongWin RcvWindow)
LastByteSent - LastByteACKed
Indirectly limits the senderrsquos send rateAssumptions
bull TCP receive buffer is very large ndash no RcvWindow constraint Amt of unACKed data at sender is solely limited by CongWin
bull Packet loss delay amp packet transmission delay are negligible
Sending rate (approx) CongWinRTT
By adjusting CongWin sender can therefore adjust the
rate at which it sends data into its connection
New variable ndash Congestion
Window
Transport Layer ndash TCP
39
BB
TCP Congestion ControlTCP Congestion ControlTCP uses ACKs to trigger (ldquoclockrdquo) its increase in congestion window size ndash ldquoself-clockingrdquo
Arrival of ACKs ndash indication to the sender that all is well
1 Slow Rate
bull Congestion window will be increased at a relatively slow rate
2 High rate
bull Congestion window will be increased more quickly
Transport Layer ndash TCP
40
BB
TCP Congestion ControlTCP Congestion ControlHow TCP perceives that there is congestion on the path
ldquoLoss Eventrdquo ndash when there is excessive congestion router buffers along the path overflows causing datagrams to be dropped which in turn results in a ldquoloss eventrdquo at the sender
1 Timeout
bull no ACK is received after segment loss
2 Receipt of three duplicate ACKs
bull segment loss is followed by three ACKs received at the sender
Transport Layer ndash TCP
41
BB
TCP Congestion Control details
sender limits transmission LastByteSent-LastByteAcked
cwnd
roughly
cwnd is dynamic function of perceived network congestion
How does sender perceive congestion
loss event = timeout or 3 duplicate acks
TCP sender reduces rate (cwnd) after loss event
Three mechanisms1 AIMD
2 slow start
3 conservative after timeout events
rate = cwnd
RTT Bytessec
Transport Layer ndash TCP
42
BB
TCP congestion avoidance congestion avoidance additive increase multiplicative decreaseadditive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every RTT until
loss is detected multiplicative decrease cut cwnd in half after loss
timecwnd
co
nge
stio
n w
indo
w s
ize
saw toothbehavior probingfor bandwidth
Transport Layer ndash TCP
43
BB
TCP Slow Start Slow Start when connection begins
increase rate exponentially until first loss event initially cwndcwnd = 1 MSS double cwndcwnd every RTT done by incrementing cwndcwnd by 1 MSS by 1 MSS for every ACK received
summary initial rate is slow but ramps up exponentially fast (doubling of the sending rate every RTT)
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transport Layer ndash TCP
44
BB
Refinement inferring lossRefinement inferring loss
after 3 dup ACKsafter 3 dup ACKs cwndcwnd is cut in halfhalf window then grows
linearlylinearly butbut after timeout after timeout
eventevent cwndcwnd is is set to 1 MSS1 MSS window then grows
exponentiallyexponentially Up to a thresholdUp to a threshold
then grows linearlylinearly
3 dup ACKs 3 dup ACKs indicates network capable of delivering some segments
timeouttimeout indicates a ldquomore alarmingrdquo congestion scenario
Philosophy
Transport Layer ndash TCP
45
BB
Refinement
Q when should the exponential increase switch to linear
A when cwndcwnd gets to 1212 of its value before timeout
Implementation variable ssthresh (slow-start threshold)ssthresh (slow-start threshold) on loss eventon loss event ssthreshssthresh is set to 1212 of cwndcwnd just
before loss event
Transport Layer ndash TCP
46
BB
TCP Sender Congestion ControlSTATE EVENT TCP SENDER Congestion-
Control ActionCommentary
SLOW START (SS)
ACK receipt for previously unACKed data
CongWin = CongWin + MSSIf(CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo
Resulting in a doubling of CongWin every RTT
Congestion Avoidance (CA)
ACK receipt for previously unACKed data
CongWin = CongWin + MSS (MSSCongWin)
Additive increase resulting in increasing of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate ACK
Threshold = CongWin 2CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
SS or CA Timeout Threshold = CongWin 2CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter Slow Start
SS or CA Duplicate ACK Increment duplicate ACK count for segment being ACKed
CongWin and Threshold not changed
Transport Layer ndash TCP
47
BB
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3 MSSretransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++
duplicate ACK
cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
Transport Layer ndash TCP
48
BB
Congestion control
TCPrsquos Congestion Control Service
CLIENTSERVER
Problem Gridlock sets-in when there is packet loss due to router congestion
forces the End Systems to decrease the rate at which packets are sentdecrease the rate at which packets are sent during periods of congestion
The sending systemrsquos packet is lost due to congestion and is alerted when it stops receiving ACKs of packets sent
Transport Layer ndash TCP
49
BBTransport Layer 3-49
Macroscopic Description of TCP throughput
whatrsquos the average throughout of TCP as a function of window size and RTT ignore slow start (typically very short phases)
let W be the window size when loss occurs when window is W throughput is WRTT just after loss window drops to W2 throughput to
W2RTT Throughput increases linearly (by MSSRTT every
RTT) Average Throughput 75 WRTT
(Based on Idealised model for the steady-state dynamics of TCP)
Transport Layer 3-50
TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired
throughput of 10 Gbps requires window size W = 83333 in-flight
segments Throughput in terms of loss rate
L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)
new versions of TCP is needed for high-speed environments
LRTT
MSS221
Transport Layer ndash TCP
51
BB
TCP FairnessTCP Fairness
Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth
TCP connection 1
bottleneckrouter
capacity RR
TCP connection 2
Go to Summary of TCP Congestion Control
Transport Layer ndash TCP
52
BB
Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link
Link with transmission rate of RR
Each connection have the same MSSMSS RTTRTT
No other TCP connections or UDP datagrams traverse the shared link
Ignore slow start phase of TCP
Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)
Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing
AssumptionsAssumptions
Transport Layer ndash TCP
53
BB
Why is TCP fairWhy is TCP fair
Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput
proportionally
RR
equal bandwidth shareequal bandwidth share
Connection 1 throughput
Connect
ion 2
thro
ughput
congestion avoidance additive increaseloss decrease window by factor of 2
RR
A point on the graph depicts the amount of link bandwidth
jointly jointly consumedconsumed by
the connections
Full bandwidth utilisation line
We can viewWe can view a simulation a simulation
on thison this
View SimulationView Simulation
Transport Layer ndash TCP
54
BB
The End
The next succeeding slides are just for additional reading
Transport Layer ndash TCP
55
BB
TCP Latency ModelingTCP Latency Modeling
In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs
Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth
Multiple End Systems sharing a link
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
Loop holes in TCPLoop holes in TCP
1 TCP connection
1 TCP connection
1 TCP connection
3 TCP connections
Multithreading implementationMultithreading implementation
Transport Layer ndash TCP
56
BB
TCP latency modelingTCP latency modeling
TCP connection establishment time data transfer delay Actual data transmission time
Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent
QQ How long How long does it take to does it take to
receive an object receive an object from a Web from a Web
serverserver
No data transfer delayNo data transfer delay
Therersquos data transfer delayTherersquos data transfer delay
the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety
Transport Layer ndash TCP
57
BB
TCP Latency ModelingTCP Latency Modeling
Network is uncongested with one link between end systems of rate RR
CongWinCongWin (fixed) determines the amount of data that can be sent
No packet loss no packet corruption no retransmissions required
Header overheads are negligible
File to sendFile to send = integer number of segments of size MSS
Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times
CLIENT SERVER
FILEFILE
Initial ThresholdThreshold of TCP congestion mechanism is very big
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits
SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)
Transport Layer ndash TCP
58
BB
TCP latency ModelingTCP latency Modeling
Case 1 latency = 2RTT + OR
K = Number of Windows of data that cover the object
K = OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
Number of segmentsRounded up to the nearest integer
Assume W=4 segmentsAssume W=4 segments
eg O=256bits S=32bits W=4
Transport Layer ndash TCP
59
BB
TCP latency ModelingTCP latency Modeling
Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]
Number of Windows of data that cover the objectK= OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR
Sender has to wait for an ACK after a windowrsquos worth of data sent
STALLED STALLED PERIODPERIOD
If there are k windows sender willbe stalled (k-1) times
Transport Layer ndash TCP
60
BB
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
STALLED STALLED PERIODPERIOD
4 windows4 windows
OS=15
Transport Layer ndash TCP
61
BB
bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the
object as followsobject as follows
S
OkK k 110 222min
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1log
1logmin
12min
2
2
S
OK
S
OkkK
S
OkK k
Note
Transport Layer ndash TCP
62
BB
bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window
bull Transmission of Transmission of kkth window window ==
bull Stall Time Stall Time ==
bull Latency Latency ==
12
k
R
S
12
k
R
SRTT
R
S
RTTR
S
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1K
1
122k
k
R
SRTT
R
S
R
ORTT
Transport Layer ndash TCP
63
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
Transport Layer ndash TCP
64
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
R
S
R
SRTTP
R
ORTT P )12(2Latency
bull Closed-form expression for the latency
Transport Layer ndash TCP
65
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
2
1
RTTRO
P
encyMinimumLat
Latency
Slow start will not significantly increase latency if RTT ltlt OR
Transport Layer ndash TCP
66
BB
httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features
Transport Layer ndash TCP
26
BB
FLOW CONTROL FLOW CONTROL ReceiverReceiver
04060 50100
LastByteRcvd
EXAMPLE HOST A sends a large file to HOST B
LastByteRead
Application Process
Data from IP
RcvBuffer
RcvWindow=RcvBuffer-[LastByteRcvd-LastByteRead]
RECEIVER HOST B ndash uses RcvWindow LastByteRcvd LastByteReadRcvWindow LastByteRcvd LastByteRead
Initially RcvWindow = RcvBuffer Application reads from the bufferHOST BHOST B tells HOST AHOST A how much spare roomhow much spare room it has in the connection buffer by placing its current value of RcvWindowRcvWindow in the receive window field of every segment it sends to HOST AHOST A
Transport Layer ndash TCP
27
BB
FLOW CONTROL SenderSender
04060 50100
LastByteSent
EXAMPLE HOST A sends a large file to HOST B
LastByteACKed
Data
To ensure that HOST B does not overflow HOST A maintains throughout the connectionrsquos life that [LastByteSent-LastByteACKed] lt= RcvWindow
SENDER HOST A
ACKs ACKs from from Host Host BB
SENDER HOST A ndash uses RcvWindow RcvWindow of HostBof HostB LastByteSent LastByteACKed LastByteSent LastByteACKed
Transport Layer ndash TCP
28
BB
FLOW CONTROLFLOW CONTROLSome issue to consider
RcvWindowRcvWindow ndash used by the connection to provide the flow control serviceflow control service
TCPTCP requires that requires that HOST AHOST A continue to sendcontinue to send segments with one segments with one data byte when HOST Brsquos receive data byte when HOST Brsquos receive window is window is 00 Such segments will be Such segments will be ACKedACKed by HOST B by HOST B Eventually the buffer will have Eventually the buffer will have some space and the some space and the ACKsACKs will will contain contain RcvWindow gt 0RcvWindow gt 0
What happens when the receive
buffer of HOST B is full full (that is when (that is when
RcvWindow=0)RcvWindow=0)
TCP sends a segment only when there is datadata or ACKACK to send Therefore the sender must maintain the connection lsquoalivealiversquo
Transport Layer ndash TCP
29
BB
TCP Connection ManagementTCP Connection Management
Recall TCP sender receiver establish ldquoconnectionconnectionrdquo before exchanging data segments
Initialize TCP variables sequence numbers buffers flow control info (eg RcvWindow)
ClientClient is the connection initiator
In Java Socket clientSocket = new Socket(hostnameport number) connect ServerServer is contacted by client
In JavaSocket accept()
if (connectconnect(s (struct sockaddr )ampsin sizeof(sin)) = 0) printf(connect failedn) WSACleanup() exit(1)
ns = acceptaccept(s(struct sockaddr )(ampremoteaddr)ampaddrlen)
Transport Layer ndash TCP
30
BB
TCP Connection Management
Three way handshakeStep 1 clientclient end system sends TCP
SYNSYN control segment to server (executed by TCP itself)
specifies initial seq number (isn)
Step 2 serverserver end system receives SYNSYN replies with SYNACKSYNACK control segment
ACKs received SYN allocates buffers specifies serverrsquos initial seq
number
Step 3 clientclient ACKsACKs the connection with
ACK=server_isn +1ACK=server_isn +1
allocates buffers sends SYN=0SYN=0
Connection established
Client
AcceptAccept (SYN=1
seq=server_isnack=client_isn+1)
time
Server
ConnectConnect (SYN=1 seq=client_isn)
ACK (SYN=0
seq=client_isn+1ack=server_isn+1)
Establishing a connection
This is what happens when we create a socket for
connection to a server
After establishing the connection the client can receive segments with app-generated data (SYN=0)(SYN=0)
Transport Layer ndash TCP
31
BB
TCP Connection Management (cont)
Closing a connection
client closes socket
closesocket(s)closesocket(s)
Java clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
How TCP connection is established and torn down
Transport Layer ndash TCP
32
BB
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
Enters timed wait - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer ndash TCP
33
BB
TCP Connection Management (cont)
TCP client lifecycle
TCP server lifecycle
Used in case ACK gets lost It is implementation-dependent (eg 30
seconds 1 minute 2 minutes
Connection formally closes ndash all resources (eg port numbers) are
released
1
2
3
4
5
6
7
8
9
10
11
12
Transport Layer ndash TCP
34
BB
End of Flow Control and Error Control
Transport Layer ndash TCP
35
BB
Flow Control vs Congestion ControlFlow Control vs Congestion ControlSimilar actions are taken but for very different reasons
Flow ControlFlow Controlbull point-to-point traffic between sender and receiverbull speed matching service matching the rate at which the sender is sending against the rate at which the receiving application is readingbull prevents Receiver Buffer from overflowing
CongestionCongestion ndash happens when there are too many sources attempting to send data at too high a rate for the routers along the path
Congestion ControlCongestion Controlbull service that makes sure that the routers between End Systems are able to carry the offered trafficbull prevents routers from overflowing
Same course of action Throttling of the sender
Transport Layer ndash TCP
36
BB
CongestionCongestion Informally too many sources sending too
much data too fast for network to handle different from flow control Manifestations
lost packets (buffer overflow at routers) long delays (queuing in router buffers)
a top-10 problem
Principles of Congestion ControlPrinciples of Congestion Control
Transport Layer ndash TCP
37
BB
Approaches towards congestion control
End-to-end congestion control
no explicit feedback from network
congestion inferred by end-systems from observed packet loss amp delay
approach taken by TCP
Network-assisted congestion control
routers provide feedback to End Systems in the form of single bit indicating
link congestion (SNA DECbit TCPIP ECN ATM ABR)
explicit transmission rate the sender should send at
1 2
Two broad approaches towards congestion control
Transport Layer ndash TCP
38
BB
TCP Congestion ControlTCP Congestion ControlHow TCP sender limits the rate at which it sends traffic into its connection
SENDER
(Amount of unACKed data)SENDER lt min(CongWin RcvWindow)
LastByteSent - LastByteACKed
Indirectly limits the senderrsquos send rateAssumptions
bull TCP receive buffer is very large ndash no RcvWindow constraint Amt of unACKed data at sender is solely limited by CongWin
bull Packet loss delay amp packet transmission delay are negligible
Sending rate (approx) CongWinRTT
By adjusting CongWin sender can therefore adjust the
rate at which it sends data into its connection
New variable ndash Congestion
Window
Transport Layer ndash TCP
39
BB
TCP Congestion ControlTCP Congestion ControlTCP uses ACKs to trigger (ldquoclockrdquo) its increase in congestion window size ndash ldquoself-clockingrdquo
Arrival of ACKs ndash indication to the sender that all is well
1 Slow Rate
bull Congestion window will be increased at a relatively slow rate
2 High rate
bull Congestion window will be increased more quickly
Transport Layer ndash TCP
40
BB
TCP Congestion ControlTCP Congestion ControlHow TCP perceives that there is congestion on the path
ldquoLoss Eventrdquo ndash when there is excessive congestion router buffers along the path overflows causing datagrams to be dropped which in turn results in a ldquoloss eventrdquo at the sender
1 Timeout
bull no ACK is received after segment loss
2 Receipt of three duplicate ACKs
bull segment loss is followed by three ACKs received at the sender
Transport Layer ndash TCP
41
BB
TCP Congestion Control details
sender limits transmission LastByteSent-LastByteAcked
cwnd
roughly
cwnd is dynamic function of perceived network congestion
How does sender perceive congestion
loss event = timeout or 3 duplicate acks
TCP sender reduces rate (cwnd) after loss event
Three mechanisms1 AIMD
2 slow start
3 conservative after timeout events
rate = cwnd
RTT Bytessec
Transport Layer ndash TCP
42
BB
TCP congestion avoidance congestion avoidance additive increase multiplicative decreaseadditive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every RTT until
loss is detected multiplicative decrease cut cwnd in half after loss
timecwnd
co
nge
stio
n w
indo
w s
ize
saw toothbehavior probingfor bandwidth
Transport Layer ndash TCP
43
BB
TCP Slow Start Slow Start when connection begins
increase rate exponentially until first loss event initially cwndcwnd = 1 MSS double cwndcwnd every RTT done by incrementing cwndcwnd by 1 MSS by 1 MSS for every ACK received
summary initial rate is slow but ramps up exponentially fast (doubling of the sending rate every RTT)
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transport Layer ndash TCP
44
BB
Refinement inferring lossRefinement inferring loss
after 3 dup ACKsafter 3 dup ACKs cwndcwnd is cut in halfhalf window then grows
linearlylinearly butbut after timeout after timeout
eventevent cwndcwnd is is set to 1 MSS1 MSS window then grows
exponentiallyexponentially Up to a thresholdUp to a threshold
then grows linearlylinearly
3 dup ACKs 3 dup ACKs indicates network capable of delivering some segments
timeouttimeout indicates a ldquomore alarmingrdquo congestion scenario
Philosophy
Transport Layer ndash TCP
45
BB
Refinement
Q when should the exponential increase switch to linear
A when cwndcwnd gets to 1212 of its value before timeout
Implementation variable ssthresh (slow-start threshold)ssthresh (slow-start threshold) on loss eventon loss event ssthreshssthresh is set to 1212 of cwndcwnd just
before loss event
Transport Layer ndash TCP
46
BB
TCP Sender Congestion ControlSTATE EVENT TCP SENDER Congestion-
Control ActionCommentary
SLOW START (SS)
ACK receipt for previously unACKed data
CongWin = CongWin + MSSIf(CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo
Resulting in a doubling of CongWin every RTT
Congestion Avoidance (CA)
ACK receipt for previously unACKed data
CongWin = CongWin + MSS (MSSCongWin)
Additive increase resulting in increasing of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate ACK
Threshold = CongWin 2CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
SS or CA Timeout Threshold = CongWin 2CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter Slow Start
SS or CA Duplicate ACK Increment duplicate ACK count for segment being ACKed
CongWin and Threshold not changed
Transport Layer ndash TCP
47
BB
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3 MSSretransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++
duplicate ACK
cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
Transport Layer ndash TCP
48
BB
Congestion control
TCPrsquos Congestion Control Service
CLIENTSERVER
Problem Gridlock sets-in when there is packet loss due to router congestion
forces the End Systems to decrease the rate at which packets are sentdecrease the rate at which packets are sent during periods of congestion
The sending systemrsquos packet is lost due to congestion and is alerted when it stops receiving ACKs of packets sent
Transport Layer ndash TCP
49
BBTransport Layer 3-49
Macroscopic Description of TCP throughput
whatrsquos the average throughout of TCP as a function of window size and RTT ignore slow start (typically very short phases)
let W be the window size when loss occurs when window is W throughput is WRTT just after loss window drops to W2 throughput to
W2RTT Throughput increases linearly (by MSSRTT every
RTT) Average Throughput 75 WRTT
(Based on Idealised model for the steady-state dynamics of TCP)
Transport Layer 3-50
TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired
throughput of 10 Gbps requires window size W = 83333 in-flight
segments Throughput in terms of loss rate
L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)
new versions of TCP is needed for high-speed environments
LRTT
MSS221
Transport Layer ndash TCP
51
BB
TCP FairnessTCP Fairness
Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth
TCP connection 1
bottleneckrouter
capacity RR
TCP connection 2
Go to Summary of TCP Congestion Control
Transport Layer ndash TCP
52
BB
Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link
Link with transmission rate of RR
Each connection have the same MSSMSS RTTRTT
No other TCP connections or UDP datagrams traverse the shared link
Ignore slow start phase of TCP
Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)
Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing
AssumptionsAssumptions
Transport Layer ndash TCP
53
BB
Why is TCP fairWhy is TCP fair
Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput
proportionally
RR
equal bandwidth shareequal bandwidth share
Connection 1 throughput
Connect
ion 2
thro
ughput
congestion avoidance additive increaseloss decrease window by factor of 2
RR
A point on the graph depicts the amount of link bandwidth
jointly jointly consumedconsumed by
the connections
Full bandwidth utilisation line
We can viewWe can view a simulation a simulation
on thison this
View SimulationView Simulation
Transport Layer ndash TCP
54
BB
The End
The next succeeding slides are just for additional reading
Transport Layer ndash TCP
55
BB
TCP Latency ModelingTCP Latency Modeling
In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs
Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth
Multiple End Systems sharing a link
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
Loop holes in TCPLoop holes in TCP
1 TCP connection
1 TCP connection
1 TCP connection
3 TCP connections
Multithreading implementationMultithreading implementation
Transport Layer ndash TCP
56
BB
TCP latency modelingTCP latency modeling
TCP connection establishment time data transfer delay Actual data transmission time
Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent
QQ How long How long does it take to does it take to
receive an object receive an object from a Web from a Web
serverserver
No data transfer delayNo data transfer delay
Therersquos data transfer delayTherersquos data transfer delay
the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety
Transport Layer ndash TCP
57
BB
TCP Latency ModelingTCP Latency Modeling
Network is uncongested with one link between end systems of rate RR
CongWinCongWin (fixed) determines the amount of data that can be sent
No packet loss no packet corruption no retransmissions required
Header overheads are negligible
File to sendFile to send = integer number of segments of size MSS
Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times
CLIENT SERVER
FILEFILE
Initial ThresholdThreshold of TCP congestion mechanism is very big
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits
SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)
Transport Layer ndash TCP
58
BB
TCP latency ModelingTCP latency Modeling
Case 1 latency = 2RTT + OR
K = Number of Windows of data that cover the object
K = OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
Number of segmentsRounded up to the nearest integer
Assume W=4 segmentsAssume W=4 segments
eg O=256bits S=32bits W=4
Transport Layer ndash TCP
59
BB
TCP latency ModelingTCP latency Modeling
Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]
Number of Windows of data that cover the objectK= OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR
Sender has to wait for an ACK after a windowrsquos worth of data sent
STALLED STALLED PERIODPERIOD
If there are k windows sender willbe stalled (k-1) times
Transport Layer ndash TCP
60
BB
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
STALLED STALLED PERIODPERIOD
4 windows4 windows
OS=15
Transport Layer ndash TCP
61
BB
bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the
object as followsobject as follows
S
OkK k 110 222min
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1log
1logmin
12min
2
2
S
OK
S
OkkK
S
OkK k
Note
Transport Layer ndash TCP
62
BB
bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window
bull Transmission of Transmission of kkth window window ==
bull Stall Time Stall Time ==
bull Latency Latency ==
12
k
R
S
12
k
R
SRTT
R
S
RTTR
S
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1K
1
122k
k
R
SRTT
R
S
R
ORTT
Transport Layer ndash TCP
63
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
Transport Layer ndash TCP
64
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
R
S
R
SRTTP
R
ORTT P )12(2Latency
bull Closed-form expression for the latency
Transport Layer ndash TCP
65
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
2
1
RTTRO
P
encyMinimumLat
Latency
Slow start will not significantly increase latency if RTT ltlt OR
Transport Layer ndash TCP
66
BB
httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features
Transport Layer ndash TCP
27
BB
FLOW CONTROL SenderSender
04060 50100
LastByteSent
EXAMPLE HOST A sends a large file to HOST B
LastByteACKed
Data
To ensure that HOST B does not overflow HOST A maintains throughout the connectionrsquos life that [LastByteSent-LastByteACKed] lt= RcvWindow
SENDER HOST A
ACKs ACKs from from Host Host BB
SENDER HOST A ndash uses RcvWindow RcvWindow of HostBof HostB LastByteSent LastByteACKed LastByteSent LastByteACKed
Transport Layer ndash TCP
28
BB
FLOW CONTROLFLOW CONTROLSome issue to consider
RcvWindowRcvWindow ndash used by the connection to provide the flow control serviceflow control service
TCPTCP requires that requires that HOST AHOST A continue to sendcontinue to send segments with one segments with one data byte when HOST Brsquos receive data byte when HOST Brsquos receive window is window is 00 Such segments will be Such segments will be ACKedACKed by HOST B by HOST B Eventually the buffer will have Eventually the buffer will have some space and the some space and the ACKsACKs will will contain contain RcvWindow gt 0RcvWindow gt 0
What happens when the receive
buffer of HOST B is full full (that is when (that is when
RcvWindow=0)RcvWindow=0)
TCP sends a segment only when there is datadata or ACKACK to send Therefore the sender must maintain the connection lsquoalivealiversquo
Transport Layer ndash TCP
29
BB
TCP Connection ManagementTCP Connection Management
Recall TCP sender receiver establish ldquoconnectionconnectionrdquo before exchanging data segments
Initialize TCP variables sequence numbers buffers flow control info (eg RcvWindow)
ClientClient is the connection initiator
In Java Socket clientSocket = new Socket(hostnameport number) connect ServerServer is contacted by client
In JavaSocket accept()
if (connectconnect(s (struct sockaddr )ampsin sizeof(sin)) = 0) printf(connect failedn) WSACleanup() exit(1)
ns = acceptaccept(s(struct sockaddr )(ampremoteaddr)ampaddrlen)
Transport Layer ndash TCP
30
BB
TCP Connection Management
Three way handshakeStep 1 clientclient end system sends TCP
SYNSYN control segment to server (executed by TCP itself)
specifies initial seq number (isn)
Step 2 serverserver end system receives SYNSYN replies with SYNACKSYNACK control segment
ACKs received SYN allocates buffers specifies serverrsquos initial seq
number
Step 3 clientclient ACKsACKs the connection with
ACK=server_isn +1ACK=server_isn +1
allocates buffers sends SYN=0SYN=0
Connection established
Client
AcceptAccept (SYN=1
seq=server_isnack=client_isn+1)
time
Server
ConnectConnect (SYN=1 seq=client_isn)
ACK (SYN=0
seq=client_isn+1ack=server_isn+1)
Establishing a connection
This is what happens when we create a socket for
connection to a server
After establishing the connection the client can receive segments with app-generated data (SYN=0)(SYN=0)
Transport Layer ndash TCP
31
BB
TCP Connection Management (cont)
Closing a connection
client closes socket
closesocket(s)closesocket(s)
Java clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
How TCP connection is established and torn down
Transport Layer ndash TCP
32
BB
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
Enters timed wait - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer ndash TCP
33
BB
TCP Connection Management (cont)
TCP client lifecycle
TCP server lifecycle
Used in case ACK gets lost It is implementation-dependent (eg 30
seconds 1 minute 2 minutes
Connection formally closes ndash all resources (eg port numbers) are
released
1
2
3
4
5
6
7
8
9
10
11
12
Transport Layer ndash TCP
34
BB
End of Flow Control and Error Control
Transport Layer ndash TCP
35
BB
Flow Control vs Congestion ControlFlow Control vs Congestion ControlSimilar actions are taken but for very different reasons
Flow ControlFlow Controlbull point-to-point traffic between sender and receiverbull speed matching service matching the rate at which the sender is sending against the rate at which the receiving application is readingbull prevents Receiver Buffer from overflowing
CongestionCongestion ndash happens when there are too many sources attempting to send data at too high a rate for the routers along the path
Congestion ControlCongestion Controlbull service that makes sure that the routers between End Systems are able to carry the offered trafficbull prevents routers from overflowing
Same course of action Throttling of the sender
Transport Layer ndash TCP
36
BB
CongestionCongestion Informally too many sources sending too
much data too fast for network to handle different from flow control Manifestations
lost packets (buffer overflow at routers) long delays (queuing in router buffers)
a top-10 problem
Principles of Congestion ControlPrinciples of Congestion Control
Transport Layer ndash TCP
37
BB
Approaches towards congestion control
End-to-end congestion control
no explicit feedback from network
congestion inferred by end-systems from observed packet loss amp delay
approach taken by TCP
Network-assisted congestion control
routers provide feedback to End Systems in the form of single bit indicating
link congestion (SNA DECbit TCPIP ECN ATM ABR)
explicit transmission rate the sender should send at
1 2
Two broad approaches towards congestion control
Transport Layer ndash TCP
38
BB
TCP Congestion ControlTCP Congestion ControlHow TCP sender limits the rate at which it sends traffic into its connection
SENDER
(Amount of unACKed data)SENDER lt min(CongWin RcvWindow)
LastByteSent - LastByteACKed
Indirectly limits the senderrsquos send rateAssumptions
bull TCP receive buffer is very large ndash no RcvWindow constraint Amt of unACKed data at sender is solely limited by CongWin
bull Packet loss delay amp packet transmission delay are negligible
Sending rate (approx) CongWinRTT
By adjusting CongWin sender can therefore adjust the
rate at which it sends data into its connection
New variable ndash Congestion
Window
Transport Layer ndash TCP
39
BB
TCP Congestion ControlTCP Congestion ControlTCP uses ACKs to trigger (ldquoclockrdquo) its increase in congestion window size ndash ldquoself-clockingrdquo
Arrival of ACKs ndash indication to the sender that all is well
1 Slow Rate
bull Congestion window will be increased at a relatively slow rate
2 High rate
bull Congestion window will be increased more quickly
Transport Layer ndash TCP
40
BB
TCP Congestion ControlTCP Congestion ControlHow TCP perceives that there is congestion on the path
ldquoLoss Eventrdquo ndash when there is excessive congestion router buffers along the path overflows causing datagrams to be dropped which in turn results in a ldquoloss eventrdquo at the sender
1 Timeout
bull no ACK is received after segment loss
2 Receipt of three duplicate ACKs
bull segment loss is followed by three ACKs received at the sender
Transport Layer ndash TCP
41
BB
TCP Congestion Control details
sender limits transmission LastByteSent-LastByteAcked
cwnd
roughly
cwnd is dynamic function of perceived network congestion
How does sender perceive congestion
loss event = timeout or 3 duplicate acks
TCP sender reduces rate (cwnd) after loss event
Three mechanisms1 AIMD
2 slow start
3 conservative after timeout events
rate = cwnd
RTT Bytessec
Transport Layer ndash TCP
42
BB
TCP congestion avoidance congestion avoidance additive increase multiplicative decreaseadditive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every RTT until
loss is detected multiplicative decrease cut cwnd in half after loss
timecwnd
co
nge
stio
n w
indo
w s
ize
saw toothbehavior probingfor bandwidth
Transport Layer ndash TCP
43
BB
TCP Slow Start Slow Start when connection begins
increase rate exponentially until first loss event initially cwndcwnd = 1 MSS double cwndcwnd every RTT done by incrementing cwndcwnd by 1 MSS by 1 MSS for every ACK received
summary initial rate is slow but ramps up exponentially fast (doubling of the sending rate every RTT)
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transport Layer ndash TCP
44
BB
Refinement inferring lossRefinement inferring loss
after 3 dup ACKsafter 3 dup ACKs cwndcwnd is cut in halfhalf window then grows
linearlylinearly butbut after timeout after timeout
eventevent cwndcwnd is is set to 1 MSS1 MSS window then grows
exponentiallyexponentially Up to a thresholdUp to a threshold
then grows linearlylinearly
3 dup ACKs 3 dup ACKs indicates network capable of delivering some segments
timeouttimeout indicates a ldquomore alarmingrdquo congestion scenario
Philosophy
Transport Layer ndash TCP
45
BB
Refinement
Q when should the exponential increase switch to linear
A when cwndcwnd gets to 1212 of its value before timeout
Implementation variable ssthresh (slow-start threshold)ssthresh (slow-start threshold) on loss eventon loss event ssthreshssthresh is set to 1212 of cwndcwnd just
before loss event
Transport Layer ndash TCP
46
BB
TCP Sender Congestion ControlSTATE EVENT TCP SENDER Congestion-
Control ActionCommentary
SLOW START (SS)
ACK receipt for previously unACKed data
CongWin = CongWin + MSSIf(CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo
Resulting in a doubling of CongWin every RTT
Congestion Avoidance (CA)
ACK receipt for previously unACKed data
CongWin = CongWin + MSS (MSSCongWin)
Additive increase resulting in increasing of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate ACK
Threshold = CongWin 2CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
SS or CA Timeout Threshold = CongWin 2CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter Slow Start
SS or CA Duplicate ACK Increment duplicate ACK count for segment being ACKed
CongWin and Threshold not changed
Transport Layer ndash TCP
47
BB
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3 MSSretransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++
duplicate ACK
cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
Transport Layer ndash TCP
48
BB
Congestion control
TCPrsquos Congestion Control Service
CLIENTSERVER
Problem Gridlock sets-in when there is packet loss due to router congestion
forces the End Systems to decrease the rate at which packets are sentdecrease the rate at which packets are sent during periods of congestion
The sending systemrsquos packet is lost due to congestion and is alerted when it stops receiving ACKs of packets sent
Transport Layer ndash TCP
49
BBTransport Layer 3-49
Macroscopic Description of TCP throughput
whatrsquos the average throughout of TCP as a function of window size and RTT ignore slow start (typically very short phases)
let W be the window size when loss occurs when window is W throughput is WRTT just after loss window drops to W2 throughput to
W2RTT Throughput increases linearly (by MSSRTT every
RTT) Average Throughput 75 WRTT
(Based on Idealised model for the steady-state dynamics of TCP)
Transport Layer 3-50
TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired
throughput of 10 Gbps requires window size W = 83333 in-flight
segments Throughput in terms of loss rate
L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)
new versions of TCP is needed for high-speed environments
LRTT
MSS221
Transport Layer ndash TCP
51
BB
TCP FairnessTCP Fairness
Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth
TCP connection 1
bottleneckrouter
capacity RR
TCP connection 2
Go to Summary of TCP Congestion Control
Transport Layer ndash TCP
52
BB
Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link
Link with transmission rate of RR
Each connection have the same MSSMSS RTTRTT
No other TCP connections or UDP datagrams traverse the shared link
Ignore slow start phase of TCP
Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)
Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing
AssumptionsAssumptions
Transport Layer ndash TCP
53
BB
Why is TCP fairWhy is TCP fair
Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput
proportionally
RR
equal bandwidth shareequal bandwidth share
Connection 1 throughput
Connect
ion 2
thro
ughput
congestion avoidance additive increaseloss decrease window by factor of 2
RR
A point on the graph depicts the amount of link bandwidth
jointly jointly consumedconsumed by
the connections
Full bandwidth utilisation line
We can viewWe can view a simulation a simulation
on thison this
View SimulationView Simulation
Transport Layer ndash TCP
54
BB
The End
The next succeeding slides are just for additional reading
Transport Layer ndash TCP
55
BB
TCP Latency ModelingTCP Latency Modeling
In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs
Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth
Multiple End Systems sharing a link
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
Loop holes in TCPLoop holes in TCP
1 TCP connection
1 TCP connection
1 TCP connection
3 TCP connections
Multithreading implementationMultithreading implementation
Transport Layer ndash TCP
56
BB
TCP latency modelingTCP latency modeling
TCP connection establishment time data transfer delay Actual data transmission time
Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent
QQ How long How long does it take to does it take to
receive an object receive an object from a Web from a Web
serverserver
No data transfer delayNo data transfer delay
Therersquos data transfer delayTherersquos data transfer delay
the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety
Transport Layer ndash TCP
57
BB
TCP Latency ModelingTCP Latency Modeling
Network is uncongested with one link between end systems of rate RR
CongWinCongWin (fixed) determines the amount of data that can be sent
No packet loss no packet corruption no retransmissions required
Header overheads are negligible
File to sendFile to send = integer number of segments of size MSS
Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times
CLIENT SERVER
FILEFILE
Initial ThresholdThreshold of TCP congestion mechanism is very big
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits
SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)
Transport Layer ndash TCP
58
BB
TCP latency ModelingTCP latency Modeling
Case 1 latency = 2RTT + OR
K = Number of Windows of data that cover the object
K = OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
Number of segmentsRounded up to the nearest integer
Assume W=4 segmentsAssume W=4 segments
eg O=256bits S=32bits W=4
Transport Layer ndash TCP
59
BB
TCP latency ModelingTCP latency Modeling
Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]
Number of Windows of data that cover the objectK= OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR
Sender has to wait for an ACK after a windowrsquos worth of data sent
STALLED STALLED PERIODPERIOD
If there are k windows sender willbe stalled (k-1) times
Transport Layer ndash TCP
60
BB
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
STALLED STALLED PERIODPERIOD
4 windows4 windows
OS=15
Transport Layer ndash TCP
61
BB
bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the
object as followsobject as follows
S
OkK k 110 222min
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1log
1logmin
12min
2
2
S
OK
S
OkkK
S
OkK k
Note
Transport Layer ndash TCP
62
BB
bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window
bull Transmission of Transmission of kkth window window ==
bull Stall Time Stall Time ==
bull Latency Latency ==
12
k
R
S
12
k
R
SRTT
R
S
RTTR
S
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1K
1
122k
k
R
SRTT
R
S
R
ORTT
Transport Layer ndash TCP
63
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
Transport Layer ndash TCP
64
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
R
S
R
SRTTP
R
ORTT P )12(2Latency
bull Closed-form expression for the latency
Transport Layer ndash TCP
65
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
2
1
RTTRO
P
encyMinimumLat
Latency
Slow start will not significantly increase latency if RTT ltlt OR
Transport Layer ndash TCP
66
BB
httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features
Transport Layer ndash TCP
28
BB
FLOW CONTROLFLOW CONTROLSome issue to consider
RcvWindowRcvWindow ndash used by the connection to provide the flow control serviceflow control service
TCPTCP requires that requires that HOST AHOST A continue to sendcontinue to send segments with one segments with one data byte when HOST Brsquos receive data byte when HOST Brsquos receive window is window is 00 Such segments will be Such segments will be ACKedACKed by HOST B by HOST B Eventually the buffer will have Eventually the buffer will have some space and the some space and the ACKsACKs will will contain contain RcvWindow gt 0RcvWindow gt 0
What happens when the receive
buffer of HOST B is full full (that is when (that is when
RcvWindow=0)RcvWindow=0)
TCP sends a segment only when there is datadata or ACKACK to send Therefore the sender must maintain the connection lsquoalivealiversquo
Transport Layer ndash TCP
29
BB
TCP Connection ManagementTCP Connection Management
Recall TCP sender receiver establish ldquoconnectionconnectionrdquo before exchanging data segments
Initialize TCP variables sequence numbers buffers flow control info (eg RcvWindow)
ClientClient is the connection initiator
In Java Socket clientSocket = new Socket(hostnameport number) connect ServerServer is contacted by client
In JavaSocket accept()
if (connectconnect(s (struct sockaddr )ampsin sizeof(sin)) = 0) printf(connect failedn) WSACleanup() exit(1)
ns = acceptaccept(s(struct sockaddr )(ampremoteaddr)ampaddrlen)
Transport Layer ndash TCP
30
BB
TCP Connection Management
Three way handshakeStep 1 clientclient end system sends TCP
SYNSYN control segment to server (executed by TCP itself)
specifies initial seq number (isn)
Step 2 serverserver end system receives SYNSYN replies with SYNACKSYNACK control segment
ACKs received SYN allocates buffers specifies serverrsquos initial seq
number
Step 3 clientclient ACKsACKs the connection with
ACK=server_isn +1ACK=server_isn +1
allocates buffers sends SYN=0SYN=0
Connection established
Client
AcceptAccept (SYN=1
seq=server_isnack=client_isn+1)
time
Server
ConnectConnect (SYN=1 seq=client_isn)
ACK (SYN=0
seq=client_isn+1ack=server_isn+1)
Establishing a connection
This is what happens when we create a socket for
connection to a server
After establishing the connection the client can receive segments with app-generated data (SYN=0)(SYN=0)
Transport Layer ndash TCP
31
BB
TCP Connection Management (cont)
Closing a connection
client closes socket
closesocket(s)closesocket(s)
Java clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
How TCP connection is established and torn down
Transport Layer ndash TCP
32
BB
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
Enters timed wait - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer ndash TCP
33
BB
TCP Connection Management (cont)
TCP client lifecycle
TCP server lifecycle
Used in case ACK gets lost It is implementation-dependent (eg 30
seconds 1 minute 2 minutes
Connection formally closes ndash all resources (eg port numbers) are
released
1
2
3
4
5
6
7
8
9
10
11
12
Transport Layer ndash TCP
34
BB
End of Flow Control and Error Control
Transport Layer ndash TCP
35
BB
Flow Control vs Congestion ControlFlow Control vs Congestion ControlSimilar actions are taken but for very different reasons
Flow ControlFlow Controlbull point-to-point traffic between sender and receiverbull speed matching service matching the rate at which the sender is sending against the rate at which the receiving application is readingbull prevents Receiver Buffer from overflowing
CongestionCongestion ndash happens when there are too many sources attempting to send data at too high a rate for the routers along the path
Congestion ControlCongestion Controlbull service that makes sure that the routers between End Systems are able to carry the offered trafficbull prevents routers from overflowing
Same course of action Throttling of the sender
Transport Layer ndash TCP
36
BB
CongestionCongestion Informally too many sources sending too
much data too fast for network to handle different from flow control Manifestations
lost packets (buffer overflow at routers) long delays (queuing in router buffers)
a top-10 problem
Principles of Congestion ControlPrinciples of Congestion Control
Transport Layer ndash TCP
37
BB
Approaches towards congestion control
End-to-end congestion control
no explicit feedback from network
congestion inferred by end-systems from observed packet loss amp delay
approach taken by TCP
Network-assisted congestion control
routers provide feedback to End Systems in the form of single bit indicating
link congestion (SNA DECbit TCPIP ECN ATM ABR)
explicit transmission rate the sender should send at
1 2
Two broad approaches towards congestion control
Transport Layer ndash TCP
38
BB
TCP Congestion ControlTCP Congestion ControlHow TCP sender limits the rate at which it sends traffic into its connection
SENDER
(Amount of unACKed data)SENDER lt min(CongWin RcvWindow)
LastByteSent - LastByteACKed
Indirectly limits the senderrsquos send rateAssumptions
bull TCP receive buffer is very large ndash no RcvWindow constraint Amt of unACKed data at sender is solely limited by CongWin
bull Packet loss delay amp packet transmission delay are negligible
Sending rate (approx) CongWinRTT
By adjusting CongWin sender can therefore adjust the
rate at which it sends data into its connection
New variable ndash Congestion
Window
Transport Layer ndash TCP
39
BB
TCP Congestion ControlTCP Congestion ControlTCP uses ACKs to trigger (ldquoclockrdquo) its increase in congestion window size ndash ldquoself-clockingrdquo
Arrival of ACKs ndash indication to the sender that all is well
1 Slow Rate
bull Congestion window will be increased at a relatively slow rate
2 High rate
bull Congestion window will be increased more quickly
Transport Layer ndash TCP
40
BB
TCP Congestion ControlTCP Congestion ControlHow TCP perceives that there is congestion on the path
ldquoLoss Eventrdquo ndash when there is excessive congestion router buffers along the path overflows causing datagrams to be dropped which in turn results in a ldquoloss eventrdquo at the sender
1 Timeout
bull no ACK is received after segment loss
2 Receipt of three duplicate ACKs
bull segment loss is followed by three ACKs received at the sender
Transport Layer ndash TCP
41
BB
TCP Congestion Control details
sender limits transmission LastByteSent-LastByteAcked
cwnd
roughly
cwnd is dynamic function of perceived network congestion
How does sender perceive congestion
loss event = timeout or 3 duplicate acks
TCP sender reduces rate (cwnd) after loss event
Three mechanisms1 AIMD
2 slow start
3 conservative after timeout events
rate = cwnd
RTT Bytessec
Transport Layer ndash TCP
42
BB
TCP congestion avoidance congestion avoidance additive increase multiplicative decreaseadditive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every RTT until
loss is detected multiplicative decrease cut cwnd in half after loss
timecwnd
co
nge
stio
n w
indo
w s
ize
saw toothbehavior probingfor bandwidth
Transport Layer ndash TCP
43
BB
TCP Slow Start Slow Start when connection begins
increase rate exponentially until first loss event initially cwndcwnd = 1 MSS double cwndcwnd every RTT done by incrementing cwndcwnd by 1 MSS by 1 MSS for every ACK received
summary initial rate is slow but ramps up exponentially fast (doubling of the sending rate every RTT)
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transport Layer ndash TCP
44
BB
Refinement inferring lossRefinement inferring loss
after 3 dup ACKsafter 3 dup ACKs cwndcwnd is cut in halfhalf window then grows
linearlylinearly butbut after timeout after timeout
eventevent cwndcwnd is is set to 1 MSS1 MSS window then grows
exponentiallyexponentially Up to a thresholdUp to a threshold
then grows linearlylinearly
3 dup ACKs 3 dup ACKs indicates network capable of delivering some segments
timeouttimeout indicates a ldquomore alarmingrdquo congestion scenario
Philosophy
Transport Layer ndash TCP
45
BB
Refinement
Q when should the exponential increase switch to linear
A when cwndcwnd gets to 1212 of its value before timeout
Implementation variable ssthresh (slow-start threshold)ssthresh (slow-start threshold) on loss eventon loss event ssthreshssthresh is set to 1212 of cwndcwnd just
before loss event
Transport Layer ndash TCP
46
BB
TCP Sender Congestion ControlSTATE EVENT TCP SENDER Congestion-
Control ActionCommentary
SLOW START (SS)
ACK receipt for previously unACKed data
CongWin = CongWin + MSSIf(CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo
Resulting in a doubling of CongWin every RTT
Congestion Avoidance (CA)
ACK receipt for previously unACKed data
CongWin = CongWin + MSS (MSSCongWin)
Additive increase resulting in increasing of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate ACK
Threshold = CongWin 2CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
SS or CA Timeout Threshold = CongWin 2CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter Slow Start
SS or CA Duplicate ACK Increment duplicate ACK count for segment being ACKed
CongWin and Threshold not changed
Transport Layer ndash TCP
47
BB
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3 MSSretransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++
duplicate ACK
cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
Transport Layer ndash TCP
48
BB
Congestion control
TCPrsquos Congestion Control Service
CLIENTSERVER
Problem Gridlock sets-in when there is packet loss due to router congestion
forces the End Systems to decrease the rate at which packets are sentdecrease the rate at which packets are sent during periods of congestion
The sending systemrsquos packet is lost due to congestion and is alerted when it stops receiving ACKs of packets sent
Transport Layer ndash TCP
49
BBTransport Layer 3-49
Macroscopic Description of TCP throughput
whatrsquos the average throughout of TCP as a function of window size and RTT ignore slow start (typically very short phases)
let W be the window size when loss occurs when window is W throughput is WRTT just after loss window drops to W2 throughput to
W2RTT Throughput increases linearly (by MSSRTT every
RTT) Average Throughput 75 WRTT
(Based on Idealised model for the steady-state dynamics of TCP)
Transport Layer 3-50
TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired
throughput of 10 Gbps requires window size W = 83333 in-flight
segments Throughput in terms of loss rate
L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)
new versions of TCP is needed for high-speed environments
LRTT
MSS221
Transport Layer ndash TCP
51
BB
TCP FairnessTCP Fairness
Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth
TCP connection 1
bottleneckrouter
capacity RR
TCP connection 2
Go to Summary of TCP Congestion Control
Transport Layer ndash TCP
52
BB
Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link
Link with transmission rate of RR
Each connection have the same MSSMSS RTTRTT
No other TCP connections or UDP datagrams traverse the shared link
Ignore slow start phase of TCP
Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)
Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing
AssumptionsAssumptions
Transport Layer ndash TCP
53
BB
Why is TCP fairWhy is TCP fair
Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput
proportionally
RR
equal bandwidth shareequal bandwidth share
Connection 1 throughput
Connect
ion 2
thro
ughput
congestion avoidance additive increaseloss decrease window by factor of 2
RR
A point on the graph depicts the amount of link bandwidth
jointly jointly consumedconsumed by
the connections
Full bandwidth utilisation line
We can viewWe can view a simulation a simulation
on thison this
View SimulationView Simulation
Transport Layer ndash TCP
54
BB
The End
The next succeeding slides are just for additional reading
Transport Layer ndash TCP
55
BB
TCP Latency ModelingTCP Latency Modeling
In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs
Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth
Multiple End Systems sharing a link
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
Loop holes in TCPLoop holes in TCP
1 TCP connection
1 TCP connection
1 TCP connection
3 TCP connections
Multithreading implementationMultithreading implementation
Transport Layer ndash TCP
56
BB
TCP latency modelingTCP latency modeling
TCP connection establishment time data transfer delay Actual data transmission time
Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent
QQ How long How long does it take to does it take to
receive an object receive an object from a Web from a Web
serverserver
No data transfer delayNo data transfer delay
Therersquos data transfer delayTherersquos data transfer delay
the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety
Transport Layer ndash TCP
57
BB
TCP Latency ModelingTCP Latency Modeling
Network is uncongested with one link between end systems of rate RR
CongWinCongWin (fixed) determines the amount of data that can be sent
No packet loss no packet corruption no retransmissions required
Header overheads are negligible
File to sendFile to send = integer number of segments of size MSS
Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times
CLIENT SERVER
FILEFILE
Initial ThresholdThreshold of TCP congestion mechanism is very big
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits
SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)
Transport Layer ndash TCP
58
BB
TCP latency ModelingTCP latency Modeling
Case 1 latency = 2RTT + OR
K = Number of Windows of data that cover the object
K = OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
Number of segmentsRounded up to the nearest integer
Assume W=4 segmentsAssume W=4 segments
eg O=256bits S=32bits W=4
Transport Layer ndash TCP
59
BB
TCP latency ModelingTCP latency Modeling
Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]
Number of Windows of data that cover the objectK= OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR
Sender has to wait for an ACK after a windowrsquos worth of data sent
STALLED STALLED PERIODPERIOD
If there are k windows sender willbe stalled (k-1) times
Transport Layer ndash TCP
60
BB
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
STALLED STALLED PERIODPERIOD
4 windows4 windows
OS=15
Transport Layer ndash TCP
61
BB
bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the
object as followsobject as follows
S
OkK k 110 222min
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1log
1logmin
12min
2
2
S
OK
S
OkkK
S
OkK k
Note
Transport Layer ndash TCP
62
BB
bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window
bull Transmission of Transmission of kkth window window ==
bull Stall Time Stall Time ==
bull Latency Latency ==
12
k
R
S
12
k
R
SRTT
R
S
RTTR
S
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1K
1
122k
k
R
SRTT
R
S
R
ORTT
Transport Layer ndash TCP
63
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
Transport Layer ndash TCP
64
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
R
S
R
SRTTP
R
ORTT P )12(2Latency
bull Closed-form expression for the latency
Transport Layer ndash TCP
65
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
2
1
RTTRO
P
encyMinimumLat
Latency
Slow start will not significantly increase latency if RTT ltlt OR
Transport Layer ndash TCP
66
BB
httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features
Transport Layer ndash TCP
29
BB
TCP Connection ManagementTCP Connection Management
Recall TCP sender receiver establish ldquoconnectionconnectionrdquo before exchanging data segments
Initialize TCP variables sequence numbers buffers flow control info (eg RcvWindow)
ClientClient is the connection initiator
In Java Socket clientSocket = new Socket(hostnameport number) connect ServerServer is contacted by client
In JavaSocket accept()
if (connectconnect(s (struct sockaddr )ampsin sizeof(sin)) = 0) printf(connect failedn) WSACleanup() exit(1)
ns = acceptaccept(s(struct sockaddr )(ampremoteaddr)ampaddrlen)
Transport Layer ndash TCP
30
BB
TCP Connection Management
Three way handshakeStep 1 clientclient end system sends TCP
SYNSYN control segment to server (executed by TCP itself)
specifies initial seq number (isn)
Step 2 serverserver end system receives SYNSYN replies with SYNACKSYNACK control segment
ACKs received SYN allocates buffers specifies serverrsquos initial seq
number
Step 3 clientclient ACKsACKs the connection with
ACK=server_isn +1ACK=server_isn +1
allocates buffers sends SYN=0SYN=0
Connection established
Client
AcceptAccept (SYN=1
seq=server_isnack=client_isn+1)
time
Server
ConnectConnect (SYN=1 seq=client_isn)
ACK (SYN=0
seq=client_isn+1ack=server_isn+1)
Establishing a connection
This is what happens when we create a socket for
connection to a server
After establishing the connection the client can receive segments with app-generated data (SYN=0)(SYN=0)
Transport Layer ndash TCP
31
BB
TCP Connection Management (cont)
Closing a connection
client closes socket
closesocket(s)closesocket(s)
Java clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
How TCP connection is established and torn down
Transport Layer ndash TCP
32
BB
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
Enters timed wait - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer ndash TCP
33
BB
TCP Connection Management (cont)
TCP client lifecycle
TCP server lifecycle
Used in case ACK gets lost It is implementation-dependent (eg 30
seconds 1 minute 2 minutes
Connection formally closes ndash all resources (eg port numbers) are
released
1
2
3
4
5
6
7
8
9
10
11
12
Transport Layer ndash TCP
34
BB
End of Flow Control and Error Control
Transport Layer ndash TCP
35
BB
Flow Control vs Congestion ControlFlow Control vs Congestion ControlSimilar actions are taken but for very different reasons
Flow ControlFlow Controlbull point-to-point traffic between sender and receiverbull speed matching service matching the rate at which the sender is sending against the rate at which the receiving application is readingbull prevents Receiver Buffer from overflowing
CongestionCongestion ndash happens when there are too many sources attempting to send data at too high a rate for the routers along the path
Congestion ControlCongestion Controlbull service that makes sure that the routers between End Systems are able to carry the offered trafficbull prevents routers from overflowing
Same course of action Throttling of the sender
Transport Layer ndash TCP
36
BB
CongestionCongestion Informally too many sources sending too
much data too fast for network to handle different from flow control Manifestations
lost packets (buffer overflow at routers) long delays (queuing in router buffers)
a top-10 problem
Principles of Congestion ControlPrinciples of Congestion Control
Transport Layer ndash TCP
37
BB
Approaches towards congestion control
End-to-end congestion control
no explicit feedback from network
congestion inferred by end-systems from observed packet loss amp delay
approach taken by TCP
Network-assisted congestion control
routers provide feedback to End Systems in the form of single bit indicating
link congestion (SNA DECbit TCPIP ECN ATM ABR)
explicit transmission rate the sender should send at
1 2
Two broad approaches towards congestion control
Transport Layer ndash TCP
38
BB
TCP Congestion ControlTCP Congestion ControlHow TCP sender limits the rate at which it sends traffic into its connection
SENDER
(Amount of unACKed data)SENDER lt min(CongWin RcvWindow)
LastByteSent - LastByteACKed
Indirectly limits the senderrsquos send rateAssumptions
bull TCP receive buffer is very large ndash no RcvWindow constraint Amt of unACKed data at sender is solely limited by CongWin
bull Packet loss delay amp packet transmission delay are negligible
Sending rate (approx) CongWinRTT
By adjusting CongWin sender can therefore adjust the
rate at which it sends data into its connection
New variable ndash Congestion
Window
Transport Layer ndash TCP
39
BB
TCP Congestion ControlTCP Congestion ControlTCP uses ACKs to trigger (ldquoclockrdquo) its increase in congestion window size ndash ldquoself-clockingrdquo
Arrival of ACKs ndash indication to the sender that all is well
1 Slow Rate
bull Congestion window will be increased at a relatively slow rate
2 High rate
bull Congestion window will be increased more quickly
Transport Layer ndash TCP
40
BB
TCP Congestion ControlTCP Congestion ControlHow TCP perceives that there is congestion on the path
ldquoLoss Eventrdquo ndash when there is excessive congestion router buffers along the path overflows causing datagrams to be dropped which in turn results in a ldquoloss eventrdquo at the sender
1 Timeout
bull no ACK is received after segment loss
2 Receipt of three duplicate ACKs
bull segment loss is followed by three ACKs received at the sender
Transport Layer ndash TCP
41
BB
TCP Congestion Control details
sender limits transmission LastByteSent-LastByteAcked
cwnd
roughly
cwnd is dynamic function of perceived network congestion
How does sender perceive congestion
loss event = timeout or 3 duplicate acks
TCP sender reduces rate (cwnd) after loss event
Three mechanisms1 AIMD
2 slow start
3 conservative after timeout events
rate = cwnd
RTT Bytessec
Transport Layer ndash TCP
42
BB
TCP congestion avoidance congestion avoidance additive increase multiplicative decreaseadditive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every RTT until
loss is detected multiplicative decrease cut cwnd in half after loss
timecwnd
co
nge
stio
n w
indo
w s
ize
saw toothbehavior probingfor bandwidth
Transport Layer ndash TCP
43
BB
TCP Slow Start Slow Start when connection begins
increase rate exponentially until first loss event initially cwndcwnd = 1 MSS double cwndcwnd every RTT done by incrementing cwndcwnd by 1 MSS by 1 MSS for every ACK received
summary initial rate is slow but ramps up exponentially fast (doubling of the sending rate every RTT)
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transport Layer ndash TCP
44
BB
Refinement inferring lossRefinement inferring loss
after 3 dup ACKsafter 3 dup ACKs cwndcwnd is cut in halfhalf window then grows
linearlylinearly butbut after timeout after timeout
eventevent cwndcwnd is is set to 1 MSS1 MSS window then grows
exponentiallyexponentially Up to a thresholdUp to a threshold
then grows linearlylinearly
3 dup ACKs 3 dup ACKs indicates network capable of delivering some segments
timeouttimeout indicates a ldquomore alarmingrdquo congestion scenario
Philosophy
Transport Layer ndash TCP
45
BB
Refinement
Q when should the exponential increase switch to linear
A when cwndcwnd gets to 1212 of its value before timeout
Implementation variable ssthresh (slow-start threshold)ssthresh (slow-start threshold) on loss eventon loss event ssthreshssthresh is set to 1212 of cwndcwnd just
before loss event
Transport Layer ndash TCP
46
BB
TCP Sender Congestion ControlSTATE EVENT TCP SENDER Congestion-
Control ActionCommentary
SLOW START (SS)
ACK receipt for previously unACKed data
CongWin = CongWin + MSSIf(CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo
Resulting in a doubling of CongWin every RTT
Congestion Avoidance (CA)
ACK receipt for previously unACKed data
CongWin = CongWin + MSS (MSSCongWin)
Additive increase resulting in increasing of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate ACK
Threshold = CongWin 2CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
SS or CA Timeout Threshold = CongWin 2CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter Slow Start
SS or CA Duplicate ACK Increment duplicate ACK count for segment being ACKed
CongWin and Threshold not changed
Transport Layer ndash TCP
47
BB
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3 MSSretransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++
duplicate ACK
cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
Transport Layer ndash TCP
48
BB
Congestion control
TCPrsquos Congestion Control Service
CLIENTSERVER
Problem Gridlock sets-in when there is packet loss due to router congestion
forces the End Systems to decrease the rate at which packets are sentdecrease the rate at which packets are sent during periods of congestion
The sending systemrsquos packet is lost due to congestion and is alerted when it stops receiving ACKs of packets sent
Transport Layer ndash TCP
49
BBTransport Layer 3-49
Macroscopic Description of TCP throughput
whatrsquos the average throughout of TCP as a function of window size and RTT ignore slow start (typically very short phases)
let W be the window size when loss occurs when window is W throughput is WRTT just after loss window drops to W2 throughput to
W2RTT Throughput increases linearly (by MSSRTT every
RTT) Average Throughput 75 WRTT
(Based on Idealised model for the steady-state dynamics of TCP)
Transport Layer 3-50
TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired
throughput of 10 Gbps requires window size W = 83333 in-flight
segments Throughput in terms of loss rate
L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)
new versions of TCP is needed for high-speed environments
LRTT
MSS221
Transport Layer ndash TCP
51
BB
TCP FairnessTCP Fairness
Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth
TCP connection 1
bottleneckrouter
capacity RR
TCP connection 2
Go to Summary of TCP Congestion Control
Transport Layer ndash TCP
52
BB
Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link
Link with transmission rate of RR
Each connection have the same MSSMSS RTTRTT
No other TCP connections or UDP datagrams traverse the shared link
Ignore slow start phase of TCP
Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)
Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing
AssumptionsAssumptions
Transport Layer ndash TCP
53
BB
Why is TCP fairWhy is TCP fair
Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput
proportionally
RR
equal bandwidth shareequal bandwidth share
Connection 1 throughput
Connect
ion 2
thro
ughput
congestion avoidance additive increaseloss decrease window by factor of 2
RR
A point on the graph depicts the amount of link bandwidth
jointly jointly consumedconsumed by
the connections
Full bandwidth utilisation line
We can viewWe can view a simulation a simulation
on thison this
View SimulationView Simulation
Transport Layer ndash TCP
54
BB
The End
The next succeeding slides are just for additional reading
Transport Layer ndash TCP
55
BB
TCP Latency ModelingTCP Latency Modeling
In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs
Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth
Multiple End Systems sharing a link
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
Loop holes in TCPLoop holes in TCP
1 TCP connection
1 TCP connection
1 TCP connection
3 TCP connections
Multithreading implementationMultithreading implementation
Transport Layer ndash TCP
56
BB
TCP latency modelingTCP latency modeling
TCP connection establishment time data transfer delay Actual data transmission time
Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent
QQ How long How long does it take to does it take to
receive an object receive an object from a Web from a Web
serverserver
No data transfer delayNo data transfer delay
Therersquos data transfer delayTherersquos data transfer delay
the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety
Transport Layer ndash TCP
57
BB
TCP Latency ModelingTCP Latency Modeling
Network is uncongested with one link between end systems of rate RR
CongWinCongWin (fixed) determines the amount of data that can be sent
No packet loss no packet corruption no retransmissions required
Header overheads are negligible
File to sendFile to send = integer number of segments of size MSS
Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times
CLIENT SERVER
FILEFILE
Initial ThresholdThreshold of TCP congestion mechanism is very big
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits
SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)
Transport Layer ndash TCP
58
BB
TCP latency ModelingTCP latency Modeling
Case 1 latency = 2RTT + OR
K = Number of Windows of data that cover the object
K = OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
Number of segmentsRounded up to the nearest integer
Assume W=4 segmentsAssume W=4 segments
eg O=256bits S=32bits W=4
Transport Layer ndash TCP
59
BB
TCP latency ModelingTCP latency Modeling
Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]
Number of Windows of data that cover the objectK= OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR
Sender has to wait for an ACK after a windowrsquos worth of data sent
STALLED STALLED PERIODPERIOD
If there are k windows sender willbe stalled (k-1) times
Transport Layer ndash TCP
60
BB
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
STALLED STALLED PERIODPERIOD
4 windows4 windows
OS=15
Transport Layer ndash TCP
61
BB
bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the
object as followsobject as follows
S
OkK k 110 222min
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1log
1logmin
12min
2
2
S
OK
S
OkkK
S
OkK k
Note
Transport Layer ndash TCP
62
BB
bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window
bull Transmission of Transmission of kkth window window ==
bull Stall Time Stall Time ==
bull Latency Latency ==
12
k
R
S
12
k
R
SRTT
R
S
RTTR
S
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1K
1
122k
k
R
SRTT
R
S
R
ORTT
Transport Layer ndash TCP
63
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
Transport Layer ndash TCP
64
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
R
S
R
SRTTP
R
ORTT P )12(2Latency
bull Closed-form expression for the latency
Transport Layer ndash TCP
65
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
2
1
RTTRO
P
encyMinimumLat
Latency
Slow start will not significantly increase latency if RTT ltlt OR
Transport Layer ndash TCP
66
BB
httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features
Transport Layer ndash TCP
30
BB
TCP Connection Management
Three way handshakeStep 1 clientclient end system sends TCP
SYNSYN control segment to server (executed by TCP itself)
specifies initial seq number (isn)
Step 2 serverserver end system receives SYNSYN replies with SYNACKSYNACK control segment
ACKs received SYN allocates buffers specifies serverrsquos initial seq
number
Step 3 clientclient ACKsACKs the connection with
ACK=server_isn +1ACK=server_isn +1
allocates buffers sends SYN=0SYN=0
Connection established
Client
AcceptAccept (SYN=1
seq=server_isnack=client_isn+1)
time
Server
ConnectConnect (SYN=1 seq=client_isn)
ACK (SYN=0
seq=client_isn+1ack=server_isn+1)
Establishing a connection
This is what happens when we create a socket for
connection to a server
After establishing the connection the client can receive segments with app-generated data (SYN=0)(SYN=0)
Transport Layer ndash TCP
31
BB
TCP Connection Management (cont)
Closing a connection
client closes socket
closesocket(s)closesocket(s)
Java clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
How TCP connection is established and torn down
Transport Layer ndash TCP
32
BB
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
Enters timed wait - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer ndash TCP
33
BB
TCP Connection Management (cont)
TCP client lifecycle
TCP server lifecycle
Used in case ACK gets lost It is implementation-dependent (eg 30
seconds 1 minute 2 minutes
Connection formally closes ndash all resources (eg port numbers) are
released
1
2
3
4
5
6
7
8
9
10
11
12
Transport Layer ndash TCP
34
BB
End of Flow Control and Error Control
Transport Layer ndash TCP
35
BB
Flow Control vs Congestion ControlFlow Control vs Congestion ControlSimilar actions are taken but for very different reasons
Flow ControlFlow Controlbull point-to-point traffic between sender and receiverbull speed matching service matching the rate at which the sender is sending against the rate at which the receiving application is readingbull prevents Receiver Buffer from overflowing
CongestionCongestion ndash happens when there are too many sources attempting to send data at too high a rate for the routers along the path
Congestion ControlCongestion Controlbull service that makes sure that the routers between End Systems are able to carry the offered trafficbull prevents routers from overflowing
Same course of action Throttling of the sender
Transport Layer ndash TCP
36
BB
CongestionCongestion Informally too many sources sending too
much data too fast for network to handle different from flow control Manifestations
lost packets (buffer overflow at routers) long delays (queuing in router buffers)
a top-10 problem
Principles of Congestion ControlPrinciples of Congestion Control
Transport Layer ndash TCP
37
BB
Approaches towards congestion control
End-to-end congestion control
no explicit feedback from network
congestion inferred by end-systems from observed packet loss amp delay
approach taken by TCP
Network-assisted congestion control
routers provide feedback to End Systems in the form of single bit indicating
link congestion (SNA DECbit TCPIP ECN ATM ABR)
explicit transmission rate the sender should send at
1 2
Two broad approaches towards congestion control
Transport Layer ndash TCP
38
BB
TCP Congestion ControlTCP Congestion ControlHow TCP sender limits the rate at which it sends traffic into its connection
SENDER
(Amount of unACKed data)SENDER lt min(CongWin RcvWindow)
LastByteSent - LastByteACKed
Indirectly limits the senderrsquos send rateAssumptions
bull TCP receive buffer is very large ndash no RcvWindow constraint Amt of unACKed data at sender is solely limited by CongWin
bull Packet loss delay amp packet transmission delay are negligible
Sending rate (approx) CongWinRTT
By adjusting CongWin sender can therefore adjust the
rate at which it sends data into its connection
New variable ndash Congestion
Window
Transport Layer ndash TCP
39
BB
TCP Congestion ControlTCP Congestion ControlTCP uses ACKs to trigger (ldquoclockrdquo) its increase in congestion window size ndash ldquoself-clockingrdquo
Arrival of ACKs ndash indication to the sender that all is well
1 Slow Rate
bull Congestion window will be increased at a relatively slow rate
2 High rate
bull Congestion window will be increased more quickly
Transport Layer ndash TCP
40
BB
TCP Congestion ControlTCP Congestion ControlHow TCP perceives that there is congestion on the path
ldquoLoss Eventrdquo ndash when there is excessive congestion router buffers along the path overflows causing datagrams to be dropped which in turn results in a ldquoloss eventrdquo at the sender
1 Timeout
bull no ACK is received after segment loss
2 Receipt of three duplicate ACKs
bull segment loss is followed by three ACKs received at the sender
Transport Layer ndash TCP
41
BB
TCP Congestion Control details
sender limits transmission LastByteSent-LastByteAcked
cwnd
roughly
cwnd is dynamic function of perceived network congestion
How does sender perceive congestion
loss event = timeout or 3 duplicate acks
TCP sender reduces rate (cwnd) after loss event
Three mechanisms1 AIMD
2 slow start
3 conservative after timeout events
rate = cwnd
RTT Bytessec
Transport Layer ndash TCP
42
BB
TCP congestion avoidance congestion avoidance additive increase multiplicative decreaseadditive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every RTT until
loss is detected multiplicative decrease cut cwnd in half after loss
timecwnd
co
nge
stio
n w
indo
w s
ize
saw toothbehavior probingfor bandwidth
Transport Layer ndash TCP
43
BB
TCP Slow Start Slow Start when connection begins
increase rate exponentially until first loss event initially cwndcwnd = 1 MSS double cwndcwnd every RTT done by incrementing cwndcwnd by 1 MSS by 1 MSS for every ACK received
summary initial rate is slow but ramps up exponentially fast (doubling of the sending rate every RTT)
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transport Layer ndash TCP
44
BB
Refinement inferring lossRefinement inferring loss
after 3 dup ACKsafter 3 dup ACKs cwndcwnd is cut in halfhalf window then grows
linearlylinearly butbut after timeout after timeout
eventevent cwndcwnd is is set to 1 MSS1 MSS window then grows
exponentiallyexponentially Up to a thresholdUp to a threshold
then grows linearlylinearly
3 dup ACKs 3 dup ACKs indicates network capable of delivering some segments
timeouttimeout indicates a ldquomore alarmingrdquo congestion scenario
Philosophy
Transport Layer ndash TCP
45
BB
Refinement
Q when should the exponential increase switch to linear
A when cwndcwnd gets to 1212 of its value before timeout
Implementation variable ssthresh (slow-start threshold)ssthresh (slow-start threshold) on loss eventon loss event ssthreshssthresh is set to 1212 of cwndcwnd just
before loss event
Transport Layer ndash TCP
46
BB
TCP Sender Congestion ControlSTATE EVENT TCP SENDER Congestion-
Control ActionCommentary
SLOW START (SS)
ACK receipt for previously unACKed data
CongWin = CongWin + MSSIf(CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo
Resulting in a doubling of CongWin every RTT
Congestion Avoidance (CA)
ACK receipt for previously unACKed data
CongWin = CongWin + MSS (MSSCongWin)
Additive increase resulting in increasing of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate ACK
Threshold = CongWin 2CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
SS or CA Timeout Threshold = CongWin 2CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter Slow Start
SS or CA Duplicate ACK Increment duplicate ACK count for segment being ACKed
CongWin and Threshold not changed
Transport Layer ndash TCP
47
BB
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3 MSSretransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++
duplicate ACK
cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
Transport Layer ndash TCP
48
BB
Congestion control
TCPrsquos Congestion Control Service
CLIENTSERVER
Problem Gridlock sets-in when there is packet loss due to router congestion
forces the End Systems to decrease the rate at which packets are sentdecrease the rate at which packets are sent during periods of congestion
The sending systemrsquos packet is lost due to congestion and is alerted when it stops receiving ACKs of packets sent
Transport Layer ndash TCP
49
BBTransport Layer 3-49
Macroscopic Description of TCP throughput
whatrsquos the average throughout of TCP as a function of window size and RTT ignore slow start (typically very short phases)
let W be the window size when loss occurs when window is W throughput is WRTT just after loss window drops to W2 throughput to
W2RTT Throughput increases linearly (by MSSRTT every
RTT) Average Throughput 75 WRTT
(Based on Idealised model for the steady-state dynamics of TCP)
Transport Layer 3-50
TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired
throughput of 10 Gbps requires window size W = 83333 in-flight
segments Throughput in terms of loss rate
L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)
new versions of TCP is needed for high-speed environments
LRTT
MSS221
Transport Layer ndash TCP
51
BB
TCP FairnessTCP Fairness
Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth
TCP connection 1
bottleneckrouter
capacity RR
TCP connection 2
Go to Summary of TCP Congestion Control
Transport Layer ndash TCP
52
BB
Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link
Link with transmission rate of RR
Each connection have the same MSSMSS RTTRTT
No other TCP connections or UDP datagrams traverse the shared link
Ignore slow start phase of TCP
Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)
Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing
AssumptionsAssumptions
Transport Layer ndash TCP
53
BB
Why is TCP fairWhy is TCP fair
Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput
proportionally
RR
equal bandwidth shareequal bandwidth share
Connection 1 throughput
Connect
ion 2
thro
ughput
congestion avoidance additive increaseloss decrease window by factor of 2
RR
A point on the graph depicts the amount of link bandwidth
jointly jointly consumedconsumed by
the connections
Full bandwidth utilisation line
We can viewWe can view a simulation a simulation
on thison this
View SimulationView Simulation
Transport Layer ndash TCP
54
BB
The End
The next succeeding slides are just for additional reading
Transport Layer ndash TCP
55
BB
TCP Latency ModelingTCP Latency Modeling
In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs
Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth
Multiple End Systems sharing a link
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
Loop holes in TCPLoop holes in TCP
1 TCP connection
1 TCP connection
1 TCP connection
3 TCP connections
Multithreading implementationMultithreading implementation
Transport Layer ndash TCP
56
BB
TCP latency modelingTCP latency modeling
TCP connection establishment time data transfer delay Actual data transmission time
Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent
QQ How long How long does it take to does it take to
receive an object receive an object from a Web from a Web
serverserver
No data transfer delayNo data transfer delay
Therersquos data transfer delayTherersquos data transfer delay
the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety
Transport Layer ndash TCP
57
BB
TCP Latency ModelingTCP Latency Modeling
Network is uncongested with one link between end systems of rate RR
CongWinCongWin (fixed) determines the amount of data that can be sent
No packet loss no packet corruption no retransmissions required
Header overheads are negligible
File to sendFile to send = integer number of segments of size MSS
Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times
CLIENT SERVER
FILEFILE
Initial ThresholdThreshold of TCP congestion mechanism is very big
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits
SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)
Transport Layer ndash TCP
58
BB
TCP latency ModelingTCP latency Modeling
Case 1 latency = 2RTT + OR
K = Number of Windows of data that cover the object
K = OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
Number of segmentsRounded up to the nearest integer
Assume W=4 segmentsAssume W=4 segments
eg O=256bits S=32bits W=4
Transport Layer ndash TCP
59
BB
TCP latency ModelingTCP latency Modeling
Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]
Number of Windows of data that cover the objectK= OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR
Sender has to wait for an ACK after a windowrsquos worth of data sent
STALLED STALLED PERIODPERIOD
If there are k windows sender willbe stalled (k-1) times
Transport Layer ndash TCP
60
BB
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
STALLED STALLED PERIODPERIOD
4 windows4 windows
OS=15
Transport Layer ndash TCP
61
BB
bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the
object as followsobject as follows
S
OkK k 110 222min
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1log
1logmin
12min
2
2
S
OK
S
OkkK
S
OkK k
Note
Transport Layer ndash TCP
62
BB
bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window
bull Transmission of Transmission of kkth window window ==
bull Stall Time Stall Time ==
bull Latency Latency ==
12
k
R
S
12
k
R
SRTT
R
S
RTTR
S
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1K
1
122k
k
R
SRTT
R
S
R
ORTT
Transport Layer ndash TCP
63
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
Transport Layer ndash TCP
64
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
R
S
R
SRTTP
R
ORTT P )12(2Latency
bull Closed-form expression for the latency
Transport Layer ndash TCP
65
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
2
1
RTTRO
P
encyMinimumLat
Latency
Slow start will not significantly increase latency if RTT ltlt OR
Transport Layer ndash TCP
66
BB
httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features
Transport Layer ndash TCP
31
BB
TCP Connection Management (cont)
Closing a connection
client closes socket
closesocket(s)closesocket(s)
Java clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
How TCP connection is established and torn down
Transport Layer ndash TCP
32
BB
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
Enters timed wait - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer ndash TCP
33
BB
TCP Connection Management (cont)
TCP client lifecycle
TCP server lifecycle
Used in case ACK gets lost It is implementation-dependent (eg 30
seconds 1 minute 2 minutes
Connection formally closes ndash all resources (eg port numbers) are
released
1
2
3
4
5
6
7
8
9
10
11
12
Transport Layer ndash TCP
34
BB
End of Flow Control and Error Control
Transport Layer ndash TCP
35
BB
Flow Control vs Congestion ControlFlow Control vs Congestion ControlSimilar actions are taken but for very different reasons
Flow ControlFlow Controlbull point-to-point traffic between sender and receiverbull speed matching service matching the rate at which the sender is sending against the rate at which the receiving application is readingbull prevents Receiver Buffer from overflowing
CongestionCongestion ndash happens when there are too many sources attempting to send data at too high a rate for the routers along the path
Congestion ControlCongestion Controlbull service that makes sure that the routers between End Systems are able to carry the offered trafficbull prevents routers from overflowing
Same course of action Throttling of the sender
Transport Layer ndash TCP
36
BB
CongestionCongestion Informally too many sources sending too
much data too fast for network to handle different from flow control Manifestations
lost packets (buffer overflow at routers) long delays (queuing in router buffers)
a top-10 problem
Principles of Congestion ControlPrinciples of Congestion Control
Transport Layer ndash TCP
37
BB
Approaches towards congestion control
End-to-end congestion control
no explicit feedback from network
congestion inferred by end-systems from observed packet loss amp delay
approach taken by TCP
Network-assisted congestion control
routers provide feedback to End Systems in the form of single bit indicating
link congestion (SNA DECbit TCPIP ECN ATM ABR)
explicit transmission rate the sender should send at
1 2
Two broad approaches towards congestion control
Transport Layer ndash TCP
38
BB
TCP Congestion ControlTCP Congestion ControlHow TCP sender limits the rate at which it sends traffic into its connection
SENDER
(Amount of unACKed data)SENDER lt min(CongWin RcvWindow)
LastByteSent - LastByteACKed
Indirectly limits the senderrsquos send rateAssumptions
bull TCP receive buffer is very large ndash no RcvWindow constraint Amt of unACKed data at sender is solely limited by CongWin
bull Packet loss delay amp packet transmission delay are negligible
Sending rate (approx) CongWinRTT
By adjusting CongWin sender can therefore adjust the
rate at which it sends data into its connection
New variable ndash Congestion
Window
Transport Layer ndash TCP
39
BB
TCP Congestion ControlTCP Congestion ControlTCP uses ACKs to trigger (ldquoclockrdquo) its increase in congestion window size ndash ldquoself-clockingrdquo
Arrival of ACKs ndash indication to the sender that all is well
1 Slow Rate
bull Congestion window will be increased at a relatively slow rate
2 High rate
bull Congestion window will be increased more quickly
Transport Layer ndash TCP
40
BB
TCP Congestion ControlTCP Congestion ControlHow TCP perceives that there is congestion on the path
ldquoLoss Eventrdquo ndash when there is excessive congestion router buffers along the path overflows causing datagrams to be dropped which in turn results in a ldquoloss eventrdquo at the sender
1 Timeout
bull no ACK is received after segment loss
2 Receipt of three duplicate ACKs
bull segment loss is followed by three ACKs received at the sender
Transport Layer ndash TCP
41
BB
TCP Congestion Control details
sender limits transmission LastByteSent-LastByteAcked
cwnd
roughly
cwnd is dynamic function of perceived network congestion
How does sender perceive congestion
loss event = timeout or 3 duplicate acks
TCP sender reduces rate (cwnd) after loss event
Three mechanisms1 AIMD
2 slow start
3 conservative after timeout events
rate = cwnd
RTT Bytessec
Transport Layer ndash TCP
42
BB
TCP congestion avoidance congestion avoidance additive increase multiplicative decreaseadditive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every RTT until
loss is detected multiplicative decrease cut cwnd in half after loss
timecwnd
co
nge
stio
n w
indo
w s
ize
saw toothbehavior probingfor bandwidth
Transport Layer ndash TCP
43
BB
TCP Slow Start Slow Start when connection begins
increase rate exponentially until first loss event initially cwndcwnd = 1 MSS double cwndcwnd every RTT done by incrementing cwndcwnd by 1 MSS by 1 MSS for every ACK received
summary initial rate is slow but ramps up exponentially fast (doubling of the sending rate every RTT)
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transport Layer ndash TCP
44
BB
Refinement inferring lossRefinement inferring loss
after 3 dup ACKsafter 3 dup ACKs cwndcwnd is cut in halfhalf window then grows
linearlylinearly butbut after timeout after timeout
eventevent cwndcwnd is is set to 1 MSS1 MSS window then grows
exponentiallyexponentially Up to a thresholdUp to a threshold
then grows linearlylinearly
3 dup ACKs 3 dup ACKs indicates network capable of delivering some segments
timeouttimeout indicates a ldquomore alarmingrdquo congestion scenario
Philosophy
Transport Layer ndash TCP
45
BB
Refinement
Q when should the exponential increase switch to linear
A when cwndcwnd gets to 1212 of its value before timeout
Implementation variable ssthresh (slow-start threshold)ssthresh (slow-start threshold) on loss eventon loss event ssthreshssthresh is set to 1212 of cwndcwnd just
before loss event
Transport Layer ndash TCP
46
BB
TCP Sender Congestion ControlSTATE EVENT TCP SENDER Congestion-
Control ActionCommentary
SLOW START (SS)
ACK receipt for previously unACKed data
CongWin = CongWin + MSSIf(CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo
Resulting in a doubling of CongWin every RTT
Congestion Avoidance (CA)
ACK receipt for previously unACKed data
CongWin = CongWin + MSS (MSSCongWin)
Additive increase resulting in increasing of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate ACK
Threshold = CongWin 2CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
SS or CA Timeout Threshold = CongWin 2CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter Slow Start
SS or CA Duplicate ACK Increment duplicate ACK count for segment being ACKed
CongWin and Threshold not changed
Transport Layer ndash TCP
47
BB
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3 MSSretransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++
duplicate ACK
cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
Transport Layer ndash TCP
48
BB
Congestion control
TCPrsquos Congestion Control Service
CLIENTSERVER
Problem Gridlock sets-in when there is packet loss due to router congestion
forces the End Systems to decrease the rate at which packets are sentdecrease the rate at which packets are sent during periods of congestion
The sending systemrsquos packet is lost due to congestion and is alerted when it stops receiving ACKs of packets sent
Transport Layer ndash TCP
49
BBTransport Layer 3-49
Macroscopic Description of TCP throughput
whatrsquos the average throughout of TCP as a function of window size and RTT ignore slow start (typically very short phases)
let W be the window size when loss occurs when window is W throughput is WRTT just after loss window drops to W2 throughput to
W2RTT Throughput increases linearly (by MSSRTT every
RTT) Average Throughput 75 WRTT
(Based on Idealised model for the steady-state dynamics of TCP)
Transport Layer 3-50
TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired
throughput of 10 Gbps requires window size W = 83333 in-flight
segments Throughput in terms of loss rate
L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)
new versions of TCP is needed for high-speed environments
LRTT
MSS221
Transport Layer ndash TCP
51
BB
TCP FairnessTCP Fairness
Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth
TCP connection 1
bottleneckrouter
capacity RR
TCP connection 2
Go to Summary of TCP Congestion Control
Transport Layer ndash TCP
52
BB
Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link
Link with transmission rate of RR
Each connection have the same MSSMSS RTTRTT
No other TCP connections or UDP datagrams traverse the shared link
Ignore slow start phase of TCP
Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)
Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing
AssumptionsAssumptions
Transport Layer ndash TCP
53
BB
Why is TCP fairWhy is TCP fair
Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput
proportionally
RR
equal bandwidth shareequal bandwidth share
Connection 1 throughput
Connect
ion 2
thro
ughput
congestion avoidance additive increaseloss decrease window by factor of 2
RR
A point on the graph depicts the amount of link bandwidth
jointly jointly consumedconsumed by
the connections
Full bandwidth utilisation line
We can viewWe can view a simulation a simulation
on thison this
View SimulationView Simulation
Transport Layer ndash TCP
54
BB
The End
The next succeeding slides are just for additional reading
Transport Layer ndash TCP
55
BB
TCP Latency ModelingTCP Latency Modeling
In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs
Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth
Multiple End Systems sharing a link
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
Loop holes in TCPLoop holes in TCP
1 TCP connection
1 TCP connection
1 TCP connection
3 TCP connections
Multithreading implementationMultithreading implementation
Transport Layer ndash TCP
56
BB
TCP latency modelingTCP latency modeling
TCP connection establishment time data transfer delay Actual data transmission time
Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent
QQ How long How long does it take to does it take to
receive an object receive an object from a Web from a Web
serverserver
No data transfer delayNo data transfer delay
Therersquos data transfer delayTherersquos data transfer delay
the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety
Transport Layer ndash TCP
57
BB
TCP Latency ModelingTCP Latency Modeling
Network is uncongested with one link between end systems of rate RR
CongWinCongWin (fixed) determines the amount of data that can be sent
No packet loss no packet corruption no retransmissions required
Header overheads are negligible
File to sendFile to send = integer number of segments of size MSS
Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times
CLIENT SERVER
FILEFILE
Initial ThresholdThreshold of TCP congestion mechanism is very big
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits
SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)
Transport Layer ndash TCP
58
BB
TCP latency ModelingTCP latency Modeling
Case 1 latency = 2RTT + OR
K = Number of Windows of data that cover the object
K = OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
Number of segmentsRounded up to the nearest integer
Assume W=4 segmentsAssume W=4 segments
eg O=256bits S=32bits W=4
Transport Layer ndash TCP
59
BB
TCP latency ModelingTCP latency Modeling
Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]
Number of Windows of data that cover the objectK= OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR
Sender has to wait for an ACK after a windowrsquos worth of data sent
STALLED STALLED PERIODPERIOD
If there are k windows sender willbe stalled (k-1) times
Transport Layer ndash TCP
60
BB
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
STALLED STALLED PERIODPERIOD
4 windows4 windows
OS=15
Transport Layer ndash TCP
61
BB
bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the
object as followsobject as follows
S
OkK k 110 222min
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1log
1logmin
12min
2
2
S
OK
S
OkkK
S
OkK k
Note
Transport Layer ndash TCP
62
BB
bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window
bull Transmission of Transmission of kkth window window ==
bull Stall Time Stall Time ==
bull Latency Latency ==
12
k
R
S
12
k
R
SRTT
R
S
RTTR
S
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1K
1
122k
k
R
SRTT
R
S
R
ORTT
Transport Layer ndash TCP
63
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
Transport Layer ndash TCP
64
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
R
S
R
SRTTP
R
ORTT P )12(2Latency
bull Closed-form expression for the latency
Transport Layer ndash TCP
65
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
2
1
RTTRO
P
encyMinimumLat
Latency
Slow start will not significantly increase latency if RTT ltlt OR
Transport Layer ndash TCP
66
BB
httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features
Transport Layer ndash TCP
32
BB
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
Enters timed wait - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer ndash TCP
33
BB
TCP Connection Management (cont)
TCP client lifecycle
TCP server lifecycle
Used in case ACK gets lost It is implementation-dependent (eg 30
seconds 1 minute 2 minutes
Connection formally closes ndash all resources (eg port numbers) are
released
1
2
3
4
5
6
7
8
9
10
11
12
Transport Layer ndash TCP
34
BB
End of Flow Control and Error Control
Transport Layer ndash TCP
35
BB
Flow Control vs Congestion ControlFlow Control vs Congestion ControlSimilar actions are taken but for very different reasons
Flow ControlFlow Controlbull point-to-point traffic between sender and receiverbull speed matching service matching the rate at which the sender is sending against the rate at which the receiving application is readingbull prevents Receiver Buffer from overflowing
CongestionCongestion ndash happens when there are too many sources attempting to send data at too high a rate for the routers along the path
Congestion ControlCongestion Controlbull service that makes sure that the routers between End Systems are able to carry the offered trafficbull prevents routers from overflowing
Same course of action Throttling of the sender
Transport Layer ndash TCP
36
BB
CongestionCongestion Informally too many sources sending too
much data too fast for network to handle different from flow control Manifestations
lost packets (buffer overflow at routers) long delays (queuing in router buffers)
a top-10 problem
Principles of Congestion ControlPrinciples of Congestion Control
Transport Layer ndash TCP
37
BB
Approaches towards congestion control
End-to-end congestion control
no explicit feedback from network
congestion inferred by end-systems from observed packet loss amp delay
approach taken by TCP
Network-assisted congestion control
routers provide feedback to End Systems in the form of single bit indicating
link congestion (SNA DECbit TCPIP ECN ATM ABR)
explicit transmission rate the sender should send at
1 2
Two broad approaches towards congestion control
Transport Layer ndash TCP
38
BB
TCP Congestion ControlTCP Congestion ControlHow TCP sender limits the rate at which it sends traffic into its connection
SENDER
(Amount of unACKed data)SENDER lt min(CongWin RcvWindow)
LastByteSent - LastByteACKed
Indirectly limits the senderrsquos send rateAssumptions
bull TCP receive buffer is very large ndash no RcvWindow constraint Amt of unACKed data at sender is solely limited by CongWin
bull Packet loss delay amp packet transmission delay are negligible
Sending rate (approx) CongWinRTT
By adjusting CongWin sender can therefore adjust the
rate at which it sends data into its connection
New variable ndash Congestion
Window
Transport Layer ndash TCP
39
BB
TCP Congestion ControlTCP Congestion ControlTCP uses ACKs to trigger (ldquoclockrdquo) its increase in congestion window size ndash ldquoself-clockingrdquo
Arrival of ACKs ndash indication to the sender that all is well
1 Slow Rate
bull Congestion window will be increased at a relatively slow rate
2 High rate
bull Congestion window will be increased more quickly
Transport Layer ndash TCP
40
BB
TCP Congestion ControlTCP Congestion ControlHow TCP perceives that there is congestion on the path
ldquoLoss Eventrdquo ndash when there is excessive congestion router buffers along the path overflows causing datagrams to be dropped which in turn results in a ldquoloss eventrdquo at the sender
1 Timeout
bull no ACK is received after segment loss
2 Receipt of three duplicate ACKs
bull segment loss is followed by three ACKs received at the sender
Transport Layer ndash TCP
41
BB
TCP Congestion Control details
sender limits transmission LastByteSent-LastByteAcked
cwnd
roughly
cwnd is dynamic function of perceived network congestion
How does sender perceive congestion
loss event = timeout or 3 duplicate acks
TCP sender reduces rate (cwnd) after loss event
Three mechanisms1 AIMD
2 slow start
3 conservative after timeout events
rate = cwnd
RTT Bytessec
Transport Layer ndash TCP
42
BB
TCP congestion avoidance congestion avoidance additive increase multiplicative decreaseadditive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every RTT until
loss is detected multiplicative decrease cut cwnd in half after loss
timecwnd
co
nge
stio
n w
indo
w s
ize
saw toothbehavior probingfor bandwidth
Transport Layer ndash TCP
43
BB
TCP Slow Start Slow Start when connection begins
increase rate exponentially until first loss event initially cwndcwnd = 1 MSS double cwndcwnd every RTT done by incrementing cwndcwnd by 1 MSS by 1 MSS for every ACK received
summary initial rate is slow but ramps up exponentially fast (doubling of the sending rate every RTT)
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transport Layer ndash TCP
44
BB
Refinement inferring lossRefinement inferring loss
after 3 dup ACKsafter 3 dup ACKs cwndcwnd is cut in halfhalf window then grows
linearlylinearly butbut after timeout after timeout
eventevent cwndcwnd is is set to 1 MSS1 MSS window then grows
exponentiallyexponentially Up to a thresholdUp to a threshold
then grows linearlylinearly
3 dup ACKs 3 dup ACKs indicates network capable of delivering some segments
timeouttimeout indicates a ldquomore alarmingrdquo congestion scenario
Philosophy
Transport Layer ndash TCP
45
BB
Refinement
Q when should the exponential increase switch to linear
A when cwndcwnd gets to 1212 of its value before timeout
Implementation variable ssthresh (slow-start threshold)ssthresh (slow-start threshold) on loss eventon loss event ssthreshssthresh is set to 1212 of cwndcwnd just
before loss event
Transport Layer ndash TCP
46
BB
TCP Sender Congestion ControlSTATE EVENT TCP SENDER Congestion-
Control ActionCommentary
SLOW START (SS)
ACK receipt for previously unACKed data
CongWin = CongWin + MSSIf(CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo
Resulting in a doubling of CongWin every RTT
Congestion Avoidance (CA)
ACK receipt for previously unACKed data
CongWin = CongWin + MSS (MSSCongWin)
Additive increase resulting in increasing of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate ACK
Threshold = CongWin 2CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
SS or CA Timeout Threshold = CongWin 2CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter Slow Start
SS or CA Duplicate ACK Increment duplicate ACK count for segment being ACKed
CongWin and Threshold not changed
Transport Layer ndash TCP
47
BB
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3 MSSretransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++
duplicate ACK
cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
Transport Layer ndash TCP
48
BB
Congestion control
TCPrsquos Congestion Control Service
CLIENTSERVER
Problem Gridlock sets-in when there is packet loss due to router congestion
forces the End Systems to decrease the rate at which packets are sentdecrease the rate at which packets are sent during periods of congestion
The sending systemrsquos packet is lost due to congestion and is alerted when it stops receiving ACKs of packets sent
Transport Layer ndash TCP
49
BBTransport Layer 3-49
Macroscopic Description of TCP throughput
whatrsquos the average throughout of TCP as a function of window size and RTT ignore slow start (typically very short phases)
let W be the window size when loss occurs when window is W throughput is WRTT just after loss window drops to W2 throughput to
W2RTT Throughput increases linearly (by MSSRTT every
RTT) Average Throughput 75 WRTT
(Based on Idealised model for the steady-state dynamics of TCP)
Transport Layer 3-50
TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired
throughput of 10 Gbps requires window size W = 83333 in-flight
segments Throughput in terms of loss rate
L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)
new versions of TCP is needed for high-speed environments
LRTT
MSS221
Transport Layer ndash TCP
51
BB
TCP FairnessTCP Fairness
Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth
TCP connection 1
bottleneckrouter
capacity RR
TCP connection 2
Go to Summary of TCP Congestion Control
Transport Layer ndash TCP
52
BB
Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link
Link with transmission rate of RR
Each connection have the same MSSMSS RTTRTT
No other TCP connections or UDP datagrams traverse the shared link
Ignore slow start phase of TCP
Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)
Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing
AssumptionsAssumptions
Transport Layer ndash TCP
53
BB
Why is TCP fairWhy is TCP fair
Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput
proportionally
RR
equal bandwidth shareequal bandwidth share
Connection 1 throughput
Connect
ion 2
thro
ughput
congestion avoidance additive increaseloss decrease window by factor of 2
RR
A point on the graph depicts the amount of link bandwidth
jointly jointly consumedconsumed by
the connections
Full bandwidth utilisation line
We can viewWe can view a simulation a simulation
on thison this
View SimulationView Simulation
Transport Layer ndash TCP
54
BB
The End
The next succeeding slides are just for additional reading
Transport Layer ndash TCP
55
BB
TCP Latency ModelingTCP Latency Modeling
In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs
Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth
Multiple End Systems sharing a link
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
Loop holes in TCPLoop holes in TCP
1 TCP connection
1 TCP connection
1 TCP connection
3 TCP connections
Multithreading implementationMultithreading implementation
Transport Layer ndash TCP
56
BB
TCP latency modelingTCP latency modeling
TCP connection establishment time data transfer delay Actual data transmission time
Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent
QQ How long How long does it take to does it take to
receive an object receive an object from a Web from a Web
serverserver
No data transfer delayNo data transfer delay
Therersquos data transfer delayTherersquos data transfer delay
the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety
Transport Layer ndash TCP
57
BB
TCP Latency ModelingTCP Latency Modeling
Network is uncongested with one link between end systems of rate RR
CongWinCongWin (fixed) determines the amount of data that can be sent
No packet loss no packet corruption no retransmissions required
Header overheads are negligible
File to sendFile to send = integer number of segments of size MSS
Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times
CLIENT SERVER
FILEFILE
Initial ThresholdThreshold of TCP congestion mechanism is very big
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits
SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)
Transport Layer ndash TCP
58
BB
TCP latency ModelingTCP latency Modeling
Case 1 latency = 2RTT + OR
K = Number of Windows of data that cover the object
K = OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
Number of segmentsRounded up to the nearest integer
Assume W=4 segmentsAssume W=4 segments
eg O=256bits S=32bits W=4
Transport Layer ndash TCP
59
BB
TCP latency ModelingTCP latency Modeling
Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]
Number of Windows of data that cover the objectK= OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR
Sender has to wait for an ACK after a windowrsquos worth of data sent
STALLED STALLED PERIODPERIOD
If there are k windows sender willbe stalled (k-1) times
Transport Layer ndash TCP
60
BB
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
STALLED STALLED PERIODPERIOD
4 windows4 windows
OS=15
Transport Layer ndash TCP
61
BB
bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the
object as followsobject as follows
S
OkK k 110 222min
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1log
1logmin
12min
2
2
S
OK
S
OkkK
S
OkK k
Note
Transport Layer ndash TCP
62
BB
bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window
bull Transmission of Transmission of kkth window window ==
bull Stall Time Stall Time ==
bull Latency Latency ==
12
k
R
S
12
k
R
SRTT
R
S
RTTR
S
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1K
1
122k
k
R
SRTT
R
S
R
ORTT
Transport Layer ndash TCP
63
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
Transport Layer ndash TCP
64
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
R
S
R
SRTTP
R
ORTT P )12(2Latency
bull Closed-form expression for the latency
Transport Layer ndash TCP
65
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
2
1
RTTRO
P
encyMinimumLat
Latency
Slow start will not significantly increase latency if RTT ltlt OR
Transport Layer ndash TCP
66
BB
httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features
Transport Layer ndash TCP
33
BB
TCP Connection Management (cont)
TCP client lifecycle
TCP server lifecycle
Used in case ACK gets lost It is implementation-dependent (eg 30
seconds 1 minute 2 minutes
Connection formally closes ndash all resources (eg port numbers) are
released
1
2
3
4
5
6
7
8
9
10
11
12
Transport Layer ndash TCP
34
BB
End of Flow Control and Error Control
Transport Layer ndash TCP
35
BB
Flow Control vs Congestion ControlFlow Control vs Congestion ControlSimilar actions are taken but for very different reasons
Flow ControlFlow Controlbull point-to-point traffic between sender and receiverbull speed matching service matching the rate at which the sender is sending against the rate at which the receiving application is readingbull prevents Receiver Buffer from overflowing
CongestionCongestion ndash happens when there are too many sources attempting to send data at too high a rate for the routers along the path
Congestion ControlCongestion Controlbull service that makes sure that the routers between End Systems are able to carry the offered trafficbull prevents routers from overflowing
Same course of action Throttling of the sender
Transport Layer ndash TCP
36
BB
CongestionCongestion Informally too many sources sending too
much data too fast for network to handle different from flow control Manifestations
lost packets (buffer overflow at routers) long delays (queuing in router buffers)
a top-10 problem
Principles of Congestion ControlPrinciples of Congestion Control
Transport Layer ndash TCP
37
BB
Approaches towards congestion control
End-to-end congestion control
no explicit feedback from network
congestion inferred by end-systems from observed packet loss amp delay
approach taken by TCP
Network-assisted congestion control
routers provide feedback to End Systems in the form of single bit indicating
link congestion (SNA DECbit TCPIP ECN ATM ABR)
explicit transmission rate the sender should send at
1 2
Two broad approaches towards congestion control
Transport Layer ndash TCP
38
BB
TCP Congestion ControlTCP Congestion ControlHow TCP sender limits the rate at which it sends traffic into its connection
SENDER
(Amount of unACKed data)SENDER lt min(CongWin RcvWindow)
LastByteSent - LastByteACKed
Indirectly limits the senderrsquos send rateAssumptions
bull TCP receive buffer is very large ndash no RcvWindow constraint Amt of unACKed data at sender is solely limited by CongWin
bull Packet loss delay amp packet transmission delay are negligible
Sending rate (approx) CongWinRTT
By adjusting CongWin sender can therefore adjust the
rate at which it sends data into its connection
New variable ndash Congestion
Window
Transport Layer ndash TCP
39
BB
TCP Congestion ControlTCP Congestion ControlTCP uses ACKs to trigger (ldquoclockrdquo) its increase in congestion window size ndash ldquoself-clockingrdquo
Arrival of ACKs ndash indication to the sender that all is well
1 Slow Rate
bull Congestion window will be increased at a relatively slow rate
2 High rate
bull Congestion window will be increased more quickly
Transport Layer ndash TCP
40
BB
TCP Congestion ControlTCP Congestion ControlHow TCP perceives that there is congestion on the path
ldquoLoss Eventrdquo ndash when there is excessive congestion router buffers along the path overflows causing datagrams to be dropped which in turn results in a ldquoloss eventrdquo at the sender
1 Timeout
bull no ACK is received after segment loss
2 Receipt of three duplicate ACKs
bull segment loss is followed by three ACKs received at the sender
Transport Layer ndash TCP
41
BB
TCP Congestion Control details
sender limits transmission LastByteSent-LastByteAcked
cwnd
roughly
cwnd is dynamic function of perceived network congestion
How does sender perceive congestion
loss event = timeout or 3 duplicate acks
TCP sender reduces rate (cwnd) after loss event
Three mechanisms1 AIMD
2 slow start
3 conservative after timeout events
rate = cwnd
RTT Bytessec
Transport Layer ndash TCP
42
BB
TCP congestion avoidance congestion avoidance additive increase multiplicative decreaseadditive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every RTT until
loss is detected multiplicative decrease cut cwnd in half after loss
timecwnd
co
nge
stio
n w
indo
w s
ize
saw toothbehavior probingfor bandwidth
Transport Layer ndash TCP
43
BB
TCP Slow Start Slow Start when connection begins
increase rate exponentially until first loss event initially cwndcwnd = 1 MSS double cwndcwnd every RTT done by incrementing cwndcwnd by 1 MSS by 1 MSS for every ACK received
summary initial rate is slow but ramps up exponentially fast (doubling of the sending rate every RTT)
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transport Layer ndash TCP
44
BB
Refinement inferring lossRefinement inferring loss
after 3 dup ACKsafter 3 dup ACKs cwndcwnd is cut in halfhalf window then grows
linearlylinearly butbut after timeout after timeout
eventevent cwndcwnd is is set to 1 MSS1 MSS window then grows
exponentiallyexponentially Up to a thresholdUp to a threshold
then grows linearlylinearly
3 dup ACKs 3 dup ACKs indicates network capable of delivering some segments
timeouttimeout indicates a ldquomore alarmingrdquo congestion scenario
Philosophy
Transport Layer ndash TCP
45
BB
Refinement
Q when should the exponential increase switch to linear
A when cwndcwnd gets to 1212 of its value before timeout
Implementation variable ssthresh (slow-start threshold)ssthresh (slow-start threshold) on loss eventon loss event ssthreshssthresh is set to 1212 of cwndcwnd just
before loss event
Transport Layer ndash TCP
46
BB
TCP Sender Congestion ControlSTATE EVENT TCP SENDER Congestion-
Control ActionCommentary
SLOW START (SS)
ACK receipt for previously unACKed data
CongWin = CongWin + MSSIf(CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo
Resulting in a doubling of CongWin every RTT
Congestion Avoidance (CA)
ACK receipt for previously unACKed data
CongWin = CongWin + MSS (MSSCongWin)
Additive increase resulting in increasing of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate ACK
Threshold = CongWin 2CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
SS or CA Timeout Threshold = CongWin 2CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter Slow Start
SS or CA Duplicate ACK Increment duplicate ACK count for segment being ACKed
CongWin and Threshold not changed
Transport Layer ndash TCP
47
BB
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3 MSSretransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++
duplicate ACK
cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
Transport Layer ndash TCP
48
BB
Congestion control
TCPrsquos Congestion Control Service
CLIENTSERVER
Problem Gridlock sets-in when there is packet loss due to router congestion
forces the End Systems to decrease the rate at which packets are sentdecrease the rate at which packets are sent during periods of congestion
The sending systemrsquos packet is lost due to congestion and is alerted when it stops receiving ACKs of packets sent
Transport Layer ndash TCP
49
BBTransport Layer 3-49
Macroscopic Description of TCP throughput
whatrsquos the average throughout of TCP as a function of window size and RTT ignore slow start (typically very short phases)
let W be the window size when loss occurs when window is W throughput is WRTT just after loss window drops to W2 throughput to
W2RTT Throughput increases linearly (by MSSRTT every
RTT) Average Throughput 75 WRTT
(Based on Idealised model for the steady-state dynamics of TCP)
Transport Layer 3-50
TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired
throughput of 10 Gbps requires window size W = 83333 in-flight
segments Throughput in terms of loss rate
L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)
new versions of TCP is needed for high-speed environments
LRTT
MSS221
Transport Layer ndash TCP
51
BB
TCP FairnessTCP Fairness
Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth
TCP connection 1
bottleneckrouter
capacity RR
TCP connection 2
Go to Summary of TCP Congestion Control
Transport Layer ndash TCP
52
BB
Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link
Link with transmission rate of RR
Each connection have the same MSSMSS RTTRTT
No other TCP connections or UDP datagrams traverse the shared link
Ignore slow start phase of TCP
Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)
Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing
AssumptionsAssumptions
Transport Layer ndash TCP
53
BB
Why is TCP fairWhy is TCP fair
Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput
proportionally
RR
equal bandwidth shareequal bandwidth share
Connection 1 throughput
Connect
ion 2
thro
ughput
congestion avoidance additive increaseloss decrease window by factor of 2
RR
A point on the graph depicts the amount of link bandwidth
jointly jointly consumedconsumed by
the connections
Full bandwidth utilisation line
We can viewWe can view a simulation a simulation
on thison this
View SimulationView Simulation
Transport Layer ndash TCP
54
BB
The End
The next succeeding slides are just for additional reading
Transport Layer ndash TCP
55
BB
TCP Latency ModelingTCP Latency Modeling
In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs
Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth
Multiple End Systems sharing a link
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
Loop holes in TCPLoop holes in TCP
1 TCP connection
1 TCP connection
1 TCP connection
3 TCP connections
Multithreading implementationMultithreading implementation
Transport Layer ndash TCP
56
BB
TCP latency modelingTCP latency modeling
TCP connection establishment time data transfer delay Actual data transmission time
Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent
QQ How long How long does it take to does it take to
receive an object receive an object from a Web from a Web
serverserver
No data transfer delayNo data transfer delay
Therersquos data transfer delayTherersquos data transfer delay
the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety
Transport Layer ndash TCP
57
BB
TCP Latency ModelingTCP Latency Modeling
Network is uncongested with one link between end systems of rate RR
CongWinCongWin (fixed) determines the amount of data that can be sent
No packet loss no packet corruption no retransmissions required
Header overheads are negligible
File to sendFile to send = integer number of segments of size MSS
Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times
CLIENT SERVER
FILEFILE
Initial ThresholdThreshold of TCP congestion mechanism is very big
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits
SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)
Transport Layer ndash TCP
58
BB
TCP latency ModelingTCP latency Modeling
Case 1 latency = 2RTT + OR
K = Number of Windows of data that cover the object
K = OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
Number of segmentsRounded up to the nearest integer
Assume W=4 segmentsAssume W=4 segments
eg O=256bits S=32bits W=4
Transport Layer ndash TCP
59
BB
TCP latency ModelingTCP latency Modeling
Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]
Number of Windows of data that cover the objectK= OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR
Sender has to wait for an ACK after a windowrsquos worth of data sent
STALLED STALLED PERIODPERIOD
If there are k windows sender willbe stalled (k-1) times
Transport Layer ndash TCP
60
BB
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
STALLED STALLED PERIODPERIOD
4 windows4 windows
OS=15
Transport Layer ndash TCP
61
BB
bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the
object as followsobject as follows
S
OkK k 110 222min
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1log
1logmin
12min
2
2
S
OK
S
OkkK
S
OkK k
Note
Transport Layer ndash TCP
62
BB
bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window
bull Transmission of Transmission of kkth window window ==
bull Stall Time Stall Time ==
bull Latency Latency ==
12
k
R
S
12
k
R
SRTT
R
S
RTTR
S
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1K
1
122k
k
R
SRTT
R
S
R
ORTT
Transport Layer ndash TCP
63
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
Transport Layer ndash TCP
64
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
R
S
R
SRTTP
R
ORTT P )12(2Latency
bull Closed-form expression for the latency
Transport Layer ndash TCP
65
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
2
1
RTTRO
P
encyMinimumLat
Latency
Slow start will not significantly increase latency if RTT ltlt OR
Transport Layer ndash TCP
66
BB
httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features
Transport Layer ndash TCP
34
BB
End of Flow Control and Error Control
Transport Layer ndash TCP
35
BB
Flow Control vs Congestion ControlFlow Control vs Congestion ControlSimilar actions are taken but for very different reasons
Flow ControlFlow Controlbull point-to-point traffic between sender and receiverbull speed matching service matching the rate at which the sender is sending against the rate at which the receiving application is readingbull prevents Receiver Buffer from overflowing
CongestionCongestion ndash happens when there are too many sources attempting to send data at too high a rate for the routers along the path
Congestion ControlCongestion Controlbull service that makes sure that the routers between End Systems are able to carry the offered trafficbull prevents routers from overflowing
Same course of action Throttling of the sender
Transport Layer ndash TCP
36
BB
CongestionCongestion Informally too many sources sending too
much data too fast for network to handle different from flow control Manifestations
lost packets (buffer overflow at routers) long delays (queuing in router buffers)
a top-10 problem
Principles of Congestion ControlPrinciples of Congestion Control
Transport Layer ndash TCP
37
BB
Approaches towards congestion control
End-to-end congestion control
no explicit feedback from network
congestion inferred by end-systems from observed packet loss amp delay
approach taken by TCP
Network-assisted congestion control
routers provide feedback to End Systems in the form of single bit indicating
link congestion (SNA DECbit TCPIP ECN ATM ABR)
explicit transmission rate the sender should send at
1 2
Two broad approaches towards congestion control
Transport Layer ndash TCP
38
BB
TCP Congestion ControlTCP Congestion ControlHow TCP sender limits the rate at which it sends traffic into its connection
SENDER
(Amount of unACKed data)SENDER lt min(CongWin RcvWindow)
LastByteSent - LastByteACKed
Indirectly limits the senderrsquos send rateAssumptions
bull TCP receive buffer is very large ndash no RcvWindow constraint Amt of unACKed data at sender is solely limited by CongWin
bull Packet loss delay amp packet transmission delay are negligible
Sending rate (approx) CongWinRTT
By adjusting CongWin sender can therefore adjust the
rate at which it sends data into its connection
New variable ndash Congestion
Window
Transport Layer ndash TCP
39
BB
TCP Congestion ControlTCP Congestion ControlTCP uses ACKs to trigger (ldquoclockrdquo) its increase in congestion window size ndash ldquoself-clockingrdquo
Arrival of ACKs ndash indication to the sender that all is well
1 Slow Rate
bull Congestion window will be increased at a relatively slow rate
2 High rate
bull Congestion window will be increased more quickly
Transport Layer ndash TCP
40
BB
TCP Congestion ControlTCP Congestion ControlHow TCP perceives that there is congestion on the path
ldquoLoss Eventrdquo ndash when there is excessive congestion router buffers along the path overflows causing datagrams to be dropped which in turn results in a ldquoloss eventrdquo at the sender
1 Timeout
bull no ACK is received after segment loss
2 Receipt of three duplicate ACKs
bull segment loss is followed by three ACKs received at the sender
Transport Layer ndash TCP
41
BB
TCP Congestion Control details
sender limits transmission LastByteSent-LastByteAcked
cwnd
roughly
cwnd is dynamic function of perceived network congestion
How does sender perceive congestion
loss event = timeout or 3 duplicate acks
TCP sender reduces rate (cwnd) after loss event
Three mechanisms1 AIMD
2 slow start
3 conservative after timeout events
rate = cwnd
RTT Bytessec
Transport Layer ndash TCP
42
BB
TCP congestion avoidance congestion avoidance additive increase multiplicative decreaseadditive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every RTT until
loss is detected multiplicative decrease cut cwnd in half after loss
timecwnd
co
nge
stio
n w
indo
w s
ize
saw toothbehavior probingfor bandwidth
Transport Layer ndash TCP
43
BB
TCP Slow Start Slow Start when connection begins
increase rate exponentially until first loss event initially cwndcwnd = 1 MSS double cwndcwnd every RTT done by incrementing cwndcwnd by 1 MSS by 1 MSS for every ACK received
summary initial rate is slow but ramps up exponentially fast (doubling of the sending rate every RTT)
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transport Layer ndash TCP
44
BB
Refinement inferring lossRefinement inferring loss
after 3 dup ACKsafter 3 dup ACKs cwndcwnd is cut in halfhalf window then grows
linearlylinearly butbut after timeout after timeout
eventevent cwndcwnd is is set to 1 MSS1 MSS window then grows
exponentiallyexponentially Up to a thresholdUp to a threshold
then grows linearlylinearly
3 dup ACKs 3 dup ACKs indicates network capable of delivering some segments
timeouttimeout indicates a ldquomore alarmingrdquo congestion scenario
Philosophy
Transport Layer ndash TCP
45
BB
Refinement
Q when should the exponential increase switch to linear
A when cwndcwnd gets to 1212 of its value before timeout
Implementation variable ssthresh (slow-start threshold)ssthresh (slow-start threshold) on loss eventon loss event ssthreshssthresh is set to 1212 of cwndcwnd just
before loss event
Transport Layer ndash TCP
46
BB
TCP Sender Congestion ControlSTATE EVENT TCP SENDER Congestion-
Control ActionCommentary
SLOW START (SS)
ACK receipt for previously unACKed data
CongWin = CongWin + MSSIf(CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo
Resulting in a doubling of CongWin every RTT
Congestion Avoidance (CA)
ACK receipt for previously unACKed data
CongWin = CongWin + MSS (MSSCongWin)
Additive increase resulting in increasing of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate ACK
Threshold = CongWin 2CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
SS or CA Timeout Threshold = CongWin 2CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter Slow Start
SS or CA Duplicate ACK Increment duplicate ACK count for segment being ACKed
CongWin and Threshold not changed
Transport Layer ndash TCP
47
BB
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3 MSSretransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++
duplicate ACK
cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
Transport Layer ndash TCP
48
BB
Congestion control
TCPrsquos Congestion Control Service
CLIENTSERVER
Problem Gridlock sets-in when there is packet loss due to router congestion
forces the End Systems to decrease the rate at which packets are sentdecrease the rate at which packets are sent during periods of congestion
The sending systemrsquos packet is lost due to congestion and is alerted when it stops receiving ACKs of packets sent
Transport Layer ndash TCP
49
BBTransport Layer 3-49
Macroscopic Description of TCP throughput
whatrsquos the average throughout of TCP as a function of window size and RTT ignore slow start (typically very short phases)
let W be the window size when loss occurs when window is W throughput is WRTT just after loss window drops to W2 throughput to
W2RTT Throughput increases linearly (by MSSRTT every
RTT) Average Throughput 75 WRTT
(Based on Idealised model for the steady-state dynamics of TCP)
Transport Layer 3-50
TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired
throughput of 10 Gbps requires window size W = 83333 in-flight
segments Throughput in terms of loss rate
L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)
new versions of TCP is needed for high-speed environments
LRTT
MSS221
Transport Layer ndash TCP
51
BB
TCP FairnessTCP Fairness
Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth
TCP connection 1
bottleneckrouter
capacity RR
TCP connection 2
Go to Summary of TCP Congestion Control
Transport Layer ndash TCP
52
BB
Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link
Link with transmission rate of RR
Each connection have the same MSSMSS RTTRTT
No other TCP connections or UDP datagrams traverse the shared link
Ignore slow start phase of TCP
Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)
Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing
AssumptionsAssumptions
Transport Layer ndash TCP
53
BB
Why is TCP fairWhy is TCP fair
Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput
proportionally
RR
equal bandwidth shareequal bandwidth share
Connection 1 throughput
Connect
ion 2
thro
ughput
congestion avoidance additive increaseloss decrease window by factor of 2
RR
A point on the graph depicts the amount of link bandwidth
jointly jointly consumedconsumed by
the connections
Full bandwidth utilisation line
We can viewWe can view a simulation a simulation
on thison this
View SimulationView Simulation
Transport Layer ndash TCP
54
BB
The End
The next succeeding slides are just for additional reading
Transport Layer ndash TCP
55
BB
TCP Latency ModelingTCP Latency Modeling
In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs
Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth
Multiple End Systems sharing a link
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
Loop holes in TCPLoop holes in TCP
1 TCP connection
1 TCP connection
1 TCP connection
3 TCP connections
Multithreading implementationMultithreading implementation
Transport Layer ndash TCP
56
BB
TCP latency modelingTCP latency modeling
TCP connection establishment time data transfer delay Actual data transmission time
Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent
QQ How long How long does it take to does it take to
receive an object receive an object from a Web from a Web
serverserver
No data transfer delayNo data transfer delay
Therersquos data transfer delayTherersquos data transfer delay
the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety
Transport Layer ndash TCP
57
BB
TCP Latency ModelingTCP Latency Modeling
Network is uncongested with one link between end systems of rate RR
CongWinCongWin (fixed) determines the amount of data that can be sent
No packet loss no packet corruption no retransmissions required
Header overheads are negligible
File to sendFile to send = integer number of segments of size MSS
Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times
CLIENT SERVER
FILEFILE
Initial ThresholdThreshold of TCP congestion mechanism is very big
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits
SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)
Transport Layer ndash TCP
58
BB
TCP latency ModelingTCP latency Modeling
Case 1 latency = 2RTT + OR
K = Number of Windows of data that cover the object
K = OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
Number of segmentsRounded up to the nearest integer
Assume W=4 segmentsAssume W=4 segments
eg O=256bits S=32bits W=4
Transport Layer ndash TCP
59
BB
TCP latency ModelingTCP latency Modeling
Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]
Number of Windows of data that cover the objectK= OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR
Sender has to wait for an ACK after a windowrsquos worth of data sent
STALLED STALLED PERIODPERIOD
If there are k windows sender willbe stalled (k-1) times
Transport Layer ndash TCP
60
BB
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
STALLED STALLED PERIODPERIOD
4 windows4 windows
OS=15
Transport Layer ndash TCP
61
BB
bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the
object as followsobject as follows
S
OkK k 110 222min
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1log
1logmin
12min
2
2
S
OK
S
OkkK
S
OkK k
Note
Transport Layer ndash TCP
62
BB
bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window
bull Transmission of Transmission of kkth window window ==
bull Stall Time Stall Time ==
bull Latency Latency ==
12
k
R
S
12
k
R
SRTT
R
S
RTTR
S
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1K
1
122k
k
R
SRTT
R
S
R
ORTT
Transport Layer ndash TCP
63
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
Transport Layer ndash TCP
64
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
R
S
R
SRTTP
R
ORTT P )12(2Latency
bull Closed-form expression for the latency
Transport Layer ndash TCP
65
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
2
1
RTTRO
P
encyMinimumLat
Latency
Slow start will not significantly increase latency if RTT ltlt OR
Transport Layer ndash TCP
66
BB
httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features
Transport Layer ndash TCP
35
BB
Flow Control vs Congestion ControlFlow Control vs Congestion ControlSimilar actions are taken but for very different reasons
Flow ControlFlow Controlbull point-to-point traffic between sender and receiverbull speed matching service matching the rate at which the sender is sending against the rate at which the receiving application is readingbull prevents Receiver Buffer from overflowing
CongestionCongestion ndash happens when there are too many sources attempting to send data at too high a rate for the routers along the path
Congestion ControlCongestion Controlbull service that makes sure that the routers between End Systems are able to carry the offered trafficbull prevents routers from overflowing
Same course of action Throttling of the sender
Transport Layer ndash TCP
36
BB
CongestionCongestion Informally too many sources sending too
much data too fast for network to handle different from flow control Manifestations
lost packets (buffer overflow at routers) long delays (queuing in router buffers)
a top-10 problem
Principles of Congestion ControlPrinciples of Congestion Control
Transport Layer ndash TCP
37
BB
Approaches towards congestion control
End-to-end congestion control
no explicit feedback from network
congestion inferred by end-systems from observed packet loss amp delay
approach taken by TCP
Network-assisted congestion control
routers provide feedback to End Systems in the form of single bit indicating
link congestion (SNA DECbit TCPIP ECN ATM ABR)
explicit transmission rate the sender should send at
1 2
Two broad approaches towards congestion control
Transport Layer ndash TCP
38
BB
TCP Congestion ControlTCP Congestion ControlHow TCP sender limits the rate at which it sends traffic into its connection
SENDER
(Amount of unACKed data)SENDER lt min(CongWin RcvWindow)
LastByteSent - LastByteACKed
Indirectly limits the senderrsquos send rateAssumptions
bull TCP receive buffer is very large ndash no RcvWindow constraint Amt of unACKed data at sender is solely limited by CongWin
bull Packet loss delay amp packet transmission delay are negligible
Sending rate (approx) CongWinRTT
By adjusting CongWin sender can therefore adjust the
rate at which it sends data into its connection
New variable ndash Congestion
Window
Transport Layer ndash TCP
39
BB
TCP Congestion ControlTCP Congestion ControlTCP uses ACKs to trigger (ldquoclockrdquo) its increase in congestion window size ndash ldquoself-clockingrdquo
Arrival of ACKs ndash indication to the sender that all is well
1 Slow Rate
bull Congestion window will be increased at a relatively slow rate
2 High rate
bull Congestion window will be increased more quickly
Transport Layer ndash TCP
40
BB
TCP Congestion ControlTCP Congestion ControlHow TCP perceives that there is congestion on the path
ldquoLoss Eventrdquo ndash when there is excessive congestion router buffers along the path overflows causing datagrams to be dropped which in turn results in a ldquoloss eventrdquo at the sender
1 Timeout
bull no ACK is received after segment loss
2 Receipt of three duplicate ACKs
bull segment loss is followed by three ACKs received at the sender
Transport Layer ndash TCP
41
BB
TCP Congestion Control details
sender limits transmission LastByteSent-LastByteAcked
cwnd
roughly
cwnd is dynamic function of perceived network congestion
How does sender perceive congestion
loss event = timeout or 3 duplicate acks
TCP sender reduces rate (cwnd) after loss event
Three mechanisms1 AIMD
2 slow start
3 conservative after timeout events
rate = cwnd
RTT Bytessec
Transport Layer ndash TCP
42
BB
TCP congestion avoidance congestion avoidance additive increase multiplicative decreaseadditive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every RTT until
loss is detected multiplicative decrease cut cwnd in half after loss
timecwnd
co
nge
stio
n w
indo
w s
ize
saw toothbehavior probingfor bandwidth
Transport Layer ndash TCP
43
BB
TCP Slow Start Slow Start when connection begins
increase rate exponentially until first loss event initially cwndcwnd = 1 MSS double cwndcwnd every RTT done by incrementing cwndcwnd by 1 MSS by 1 MSS for every ACK received
summary initial rate is slow but ramps up exponentially fast (doubling of the sending rate every RTT)
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transport Layer ndash TCP
44
BB
Refinement inferring lossRefinement inferring loss
after 3 dup ACKsafter 3 dup ACKs cwndcwnd is cut in halfhalf window then grows
linearlylinearly butbut after timeout after timeout
eventevent cwndcwnd is is set to 1 MSS1 MSS window then grows
exponentiallyexponentially Up to a thresholdUp to a threshold
then grows linearlylinearly
3 dup ACKs 3 dup ACKs indicates network capable of delivering some segments
timeouttimeout indicates a ldquomore alarmingrdquo congestion scenario
Philosophy
Transport Layer ndash TCP
45
BB
Refinement
Q when should the exponential increase switch to linear
A when cwndcwnd gets to 1212 of its value before timeout
Implementation variable ssthresh (slow-start threshold)ssthresh (slow-start threshold) on loss eventon loss event ssthreshssthresh is set to 1212 of cwndcwnd just
before loss event
Transport Layer ndash TCP
46
BB
TCP Sender Congestion ControlSTATE EVENT TCP SENDER Congestion-
Control ActionCommentary
SLOW START (SS)
ACK receipt for previously unACKed data
CongWin = CongWin + MSSIf(CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo
Resulting in a doubling of CongWin every RTT
Congestion Avoidance (CA)
ACK receipt for previously unACKed data
CongWin = CongWin + MSS (MSSCongWin)
Additive increase resulting in increasing of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate ACK
Threshold = CongWin 2CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
SS or CA Timeout Threshold = CongWin 2CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter Slow Start
SS or CA Duplicate ACK Increment duplicate ACK count for segment being ACKed
CongWin and Threshold not changed
Transport Layer ndash TCP
47
BB
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3 MSSretransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++
duplicate ACK
cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
Transport Layer ndash TCP
48
BB
Congestion control
TCPrsquos Congestion Control Service
CLIENTSERVER
Problem Gridlock sets-in when there is packet loss due to router congestion
forces the End Systems to decrease the rate at which packets are sentdecrease the rate at which packets are sent during periods of congestion
The sending systemrsquos packet is lost due to congestion and is alerted when it stops receiving ACKs of packets sent
Transport Layer ndash TCP
49
BBTransport Layer 3-49
Macroscopic Description of TCP throughput
whatrsquos the average throughout of TCP as a function of window size and RTT ignore slow start (typically very short phases)
let W be the window size when loss occurs when window is W throughput is WRTT just after loss window drops to W2 throughput to
W2RTT Throughput increases linearly (by MSSRTT every
RTT) Average Throughput 75 WRTT
(Based on Idealised model for the steady-state dynamics of TCP)
Transport Layer 3-50
TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired
throughput of 10 Gbps requires window size W = 83333 in-flight
segments Throughput in terms of loss rate
L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)
new versions of TCP is needed for high-speed environments
LRTT
MSS221
Transport Layer ndash TCP
51
BB
TCP FairnessTCP Fairness
Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth
TCP connection 1
bottleneckrouter
capacity RR
TCP connection 2
Go to Summary of TCP Congestion Control
Transport Layer ndash TCP
52
BB
Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link
Link with transmission rate of RR
Each connection have the same MSSMSS RTTRTT
No other TCP connections or UDP datagrams traverse the shared link
Ignore slow start phase of TCP
Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)
Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing
AssumptionsAssumptions
Transport Layer ndash TCP
53
BB
Why is TCP fairWhy is TCP fair
Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput
proportionally
RR
equal bandwidth shareequal bandwidth share
Connection 1 throughput
Connect
ion 2
thro
ughput
congestion avoidance additive increaseloss decrease window by factor of 2
RR
A point on the graph depicts the amount of link bandwidth
jointly jointly consumedconsumed by
the connections
Full bandwidth utilisation line
We can viewWe can view a simulation a simulation
on thison this
View SimulationView Simulation
Transport Layer ndash TCP
54
BB
The End
The next succeeding slides are just for additional reading
Transport Layer ndash TCP
55
BB
TCP Latency ModelingTCP Latency Modeling
In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs
Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth
Multiple End Systems sharing a link
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
Loop holes in TCPLoop holes in TCP
1 TCP connection
1 TCP connection
1 TCP connection
3 TCP connections
Multithreading implementationMultithreading implementation
Transport Layer ndash TCP
56
BB
TCP latency modelingTCP latency modeling
TCP connection establishment time data transfer delay Actual data transmission time
Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent
QQ How long How long does it take to does it take to
receive an object receive an object from a Web from a Web
serverserver
No data transfer delayNo data transfer delay
Therersquos data transfer delayTherersquos data transfer delay
the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety
Transport Layer ndash TCP
57
BB
TCP Latency ModelingTCP Latency Modeling
Network is uncongested with one link between end systems of rate RR
CongWinCongWin (fixed) determines the amount of data that can be sent
No packet loss no packet corruption no retransmissions required
Header overheads are negligible
File to sendFile to send = integer number of segments of size MSS
Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times
CLIENT SERVER
FILEFILE
Initial ThresholdThreshold of TCP congestion mechanism is very big
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits
SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)
Transport Layer ndash TCP
58
BB
TCP latency ModelingTCP latency Modeling
Case 1 latency = 2RTT + OR
K = Number of Windows of data that cover the object
K = OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
Number of segmentsRounded up to the nearest integer
Assume W=4 segmentsAssume W=4 segments
eg O=256bits S=32bits W=4
Transport Layer ndash TCP
59
BB
TCP latency ModelingTCP latency Modeling
Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]
Number of Windows of data that cover the objectK= OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR
Sender has to wait for an ACK after a windowrsquos worth of data sent
STALLED STALLED PERIODPERIOD
If there are k windows sender willbe stalled (k-1) times
Transport Layer ndash TCP
60
BB
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
STALLED STALLED PERIODPERIOD
4 windows4 windows
OS=15
Transport Layer ndash TCP
61
BB
bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the
object as followsobject as follows
S
OkK k 110 222min
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1log
1logmin
12min
2
2
S
OK
S
OkkK
S
OkK k
Note
Transport Layer ndash TCP
62
BB
bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window
bull Transmission of Transmission of kkth window window ==
bull Stall Time Stall Time ==
bull Latency Latency ==
12
k
R
S
12
k
R
SRTT
R
S
RTTR
S
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1K
1
122k
k
R
SRTT
R
S
R
ORTT
Transport Layer ndash TCP
63
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
Transport Layer ndash TCP
64
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
R
S
R
SRTTP
R
ORTT P )12(2Latency
bull Closed-form expression for the latency
Transport Layer ndash TCP
65
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
2
1
RTTRO
P
encyMinimumLat
Latency
Slow start will not significantly increase latency if RTT ltlt OR
Transport Layer ndash TCP
66
BB
httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features
Transport Layer ndash TCP
36
BB
CongestionCongestion Informally too many sources sending too
much data too fast for network to handle different from flow control Manifestations
lost packets (buffer overflow at routers) long delays (queuing in router buffers)
a top-10 problem
Principles of Congestion ControlPrinciples of Congestion Control
Transport Layer ndash TCP
37
BB
Approaches towards congestion control
End-to-end congestion control
no explicit feedback from network
congestion inferred by end-systems from observed packet loss amp delay
approach taken by TCP
Network-assisted congestion control
routers provide feedback to End Systems in the form of single bit indicating
link congestion (SNA DECbit TCPIP ECN ATM ABR)
explicit transmission rate the sender should send at
1 2
Two broad approaches towards congestion control
Transport Layer ndash TCP
38
BB
TCP Congestion ControlTCP Congestion ControlHow TCP sender limits the rate at which it sends traffic into its connection
SENDER
(Amount of unACKed data)SENDER lt min(CongWin RcvWindow)
LastByteSent - LastByteACKed
Indirectly limits the senderrsquos send rateAssumptions
bull TCP receive buffer is very large ndash no RcvWindow constraint Amt of unACKed data at sender is solely limited by CongWin
bull Packet loss delay amp packet transmission delay are negligible
Sending rate (approx) CongWinRTT
By adjusting CongWin sender can therefore adjust the
rate at which it sends data into its connection
New variable ndash Congestion
Window
Transport Layer ndash TCP
39
BB
TCP Congestion ControlTCP Congestion ControlTCP uses ACKs to trigger (ldquoclockrdquo) its increase in congestion window size ndash ldquoself-clockingrdquo
Arrival of ACKs ndash indication to the sender that all is well
1 Slow Rate
bull Congestion window will be increased at a relatively slow rate
2 High rate
bull Congestion window will be increased more quickly
Transport Layer ndash TCP
40
BB
TCP Congestion ControlTCP Congestion ControlHow TCP perceives that there is congestion on the path
ldquoLoss Eventrdquo ndash when there is excessive congestion router buffers along the path overflows causing datagrams to be dropped which in turn results in a ldquoloss eventrdquo at the sender
1 Timeout
bull no ACK is received after segment loss
2 Receipt of three duplicate ACKs
bull segment loss is followed by three ACKs received at the sender
Transport Layer ndash TCP
41
BB
TCP Congestion Control details
sender limits transmission LastByteSent-LastByteAcked
cwnd
roughly
cwnd is dynamic function of perceived network congestion
How does sender perceive congestion
loss event = timeout or 3 duplicate acks
TCP sender reduces rate (cwnd) after loss event
Three mechanisms1 AIMD
2 slow start
3 conservative after timeout events
rate = cwnd
RTT Bytessec
Transport Layer ndash TCP
42
BB
TCP congestion avoidance congestion avoidance additive increase multiplicative decreaseadditive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every RTT until
loss is detected multiplicative decrease cut cwnd in half after loss
timecwnd
co
nge
stio
n w
indo
w s
ize
saw toothbehavior probingfor bandwidth
Transport Layer ndash TCP
43
BB
TCP Slow Start Slow Start when connection begins
increase rate exponentially until first loss event initially cwndcwnd = 1 MSS double cwndcwnd every RTT done by incrementing cwndcwnd by 1 MSS by 1 MSS for every ACK received
summary initial rate is slow but ramps up exponentially fast (doubling of the sending rate every RTT)
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transport Layer ndash TCP
44
BB
Refinement inferring lossRefinement inferring loss
after 3 dup ACKsafter 3 dup ACKs cwndcwnd is cut in halfhalf window then grows
linearlylinearly butbut after timeout after timeout
eventevent cwndcwnd is is set to 1 MSS1 MSS window then grows
exponentiallyexponentially Up to a thresholdUp to a threshold
then grows linearlylinearly
3 dup ACKs 3 dup ACKs indicates network capable of delivering some segments
timeouttimeout indicates a ldquomore alarmingrdquo congestion scenario
Philosophy
Transport Layer ndash TCP
45
BB
Refinement
Q when should the exponential increase switch to linear
A when cwndcwnd gets to 1212 of its value before timeout
Implementation variable ssthresh (slow-start threshold)ssthresh (slow-start threshold) on loss eventon loss event ssthreshssthresh is set to 1212 of cwndcwnd just
before loss event
Transport Layer ndash TCP
46
BB
TCP Sender Congestion ControlSTATE EVENT TCP SENDER Congestion-
Control ActionCommentary
SLOW START (SS)
ACK receipt for previously unACKed data
CongWin = CongWin + MSSIf(CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo
Resulting in a doubling of CongWin every RTT
Congestion Avoidance (CA)
ACK receipt for previously unACKed data
CongWin = CongWin + MSS (MSSCongWin)
Additive increase resulting in increasing of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate ACK
Threshold = CongWin 2CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
SS or CA Timeout Threshold = CongWin 2CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter Slow Start
SS or CA Duplicate ACK Increment duplicate ACK count for segment being ACKed
CongWin and Threshold not changed
Transport Layer ndash TCP
47
BB
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3 MSSretransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++
duplicate ACK
cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
Transport Layer ndash TCP
48
BB
Congestion control
TCPrsquos Congestion Control Service
CLIENTSERVER
Problem Gridlock sets-in when there is packet loss due to router congestion
forces the End Systems to decrease the rate at which packets are sentdecrease the rate at which packets are sent during periods of congestion
The sending systemrsquos packet is lost due to congestion and is alerted when it stops receiving ACKs of packets sent
Transport Layer ndash TCP
49
BBTransport Layer 3-49
Macroscopic Description of TCP throughput
whatrsquos the average throughout of TCP as a function of window size and RTT ignore slow start (typically very short phases)
let W be the window size when loss occurs when window is W throughput is WRTT just after loss window drops to W2 throughput to
W2RTT Throughput increases linearly (by MSSRTT every
RTT) Average Throughput 75 WRTT
(Based on Idealised model for the steady-state dynamics of TCP)
Transport Layer 3-50
TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired
throughput of 10 Gbps requires window size W = 83333 in-flight
segments Throughput in terms of loss rate
L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)
new versions of TCP is needed for high-speed environments
LRTT
MSS221
Transport Layer ndash TCP
51
BB
TCP FairnessTCP Fairness
Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth
TCP connection 1
bottleneckrouter
capacity RR
TCP connection 2
Go to Summary of TCP Congestion Control
Transport Layer ndash TCP
52
BB
Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link
Link with transmission rate of RR
Each connection have the same MSSMSS RTTRTT
No other TCP connections or UDP datagrams traverse the shared link
Ignore slow start phase of TCP
Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)
Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing
AssumptionsAssumptions
Transport Layer ndash TCP
53
BB
Why is TCP fairWhy is TCP fair
Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput
proportionally
RR
equal bandwidth shareequal bandwidth share
Connection 1 throughput
Connect
ion 2
thro
ughput
congestion avoidance additive increaseloss decrease window by factor of 2
RR
A point on the graph depicts the amount of link bandwidth
jointly jointly consumedconsumed by
the connections
Full bandwidth utilisation line
We can viewWe can view a simulation a simulation
on thison this
View SimulationView Simulation
Transport Layer ndash TCP
54
BB
The End
The next succeeding slides are just for additional reading
Transport Layer ndash TCP
55
BB
TCP Latency ModelingTCP Latency Modeling
In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs
Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth
Multiple End Systems sharing a link
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
Loop holes in TCPLoop holes in TCP
1 TCP connection
1 TCP connection
1 TCP connection
3 TCP connections
Multithreading implementationMultithreading implementation
Transport Layer ndash TCP
56
BB
TCP latency modelingTCP latency modeling
TCP connection establishment time data transfer delay Actual data transmission time
Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent
QQ How long How long does it take to does it take to
receive an object receive an object from a Web from a Web
serverserver
No data transfer delayNo data transfer delay
Therersquos data transfer delayTherersquos data transfer delay
the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety
Transport Layer ndash TCP
57
BB
TCP Latency ModelingTCP Latency Modeling
Network is uncongested with one link between end systems of rate RR
CongWinCongWin (fixed) determines the amount of data that can be sent
No packet loss no packet corruption no retransmissions required
Header overheads are negligible
File to sendFile to send = integer number of segments of size MSS
Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times
CLIENT SERVER
FILEFILE
Initial ThresholdThreshold of TCP congestion mechanism is very big
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits
SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)
Transport Layer ndash TCP
58
BB
TCP latency ModelingTCP latency Modeling
Case 1 latency = 2RTT + OR
K = Number of Windows of data that cover the object
K = OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
Number of segmentsRounded up to the nearest integer
Assume W=4 segmentsAssume W=4 segments
eg O=256bits S=32bits W=4
Transport Layer ndash TCP
59
BB
TCP latency ModelingTCP latency Modeling
Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]
Number of Windows of data that cover the objectK= OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR
Sender has to wait for an ACK after a windowrsquos worth of data sent
STALLED STALLED PERIODPERIOD
If there are k windows sender willbe stalled (k-1) times
Transport Layer ndash TCP
60
BB
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
STALLED STALLED PERIODPERIOD
4 windows4 windows
OS=15
Transport Layer ndash TCP
61
BB
bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the
object as followsobject as follows
S
OkK k 110 222min
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1log
1logmin
12min
2
2
S
OK
S
OkkK
S
OkK k
Note
Transport Layer ndash TCP
62
BB
bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window
bull Transmission of Transmission of kkth window window ==
bull Stall Time Stall Time ==
bull Latency Latency ==
12
k
R
S
12
k
R
SRTT
R
S
RTTR
S
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1K
1
122k
k
R
SRTT
R
S
R
ORTT
Transport Layer ndash TCP
63
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
Transport Layer ndash TCP
64
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
R
S
R
SRTTP
R
ORTT P )12(2Latency
bull Closed-form expression for the latency
Transport Layer ndash TCP
65
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
2
1
RTTRO
P
encyMinimumLat
Latency
Slow start will not significantly increase latency if RTT ltlt OR
Transport Layer ndash TCP
66
BB
httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features
Transport Layer ndash TCP
37
BB
Approaches towards congestion control
End-to-end congestion control
no explicit feedback from network
congestion inferred by end-systems from observed packet loss amp delay
approach taken by TCP
Network-assisted congestion control
routers provide feedback to End Systems in the form of single bit indicating
link congestion (SNA DECbit TCPIP ECN ATM ABR)
explicit transmission rate the sender should send at
1 2
Two broad approaches towards congestion control
Transport Layer ndash TCP
38
BB
TCP Congestion ControlTCP Congestion ControlHow TCP sender limits the rate at which it sends traffic into its connection
SENDER
(Amount of unACKed data)SENDER lt min(CongWin RcvWindow)
LastByteSent - LastByteACKed
Indirectly limits the senderrsquos send rateAssumptions
bull TCP receive buffer is very large ndash no RcvWindow constraint Amt of unACKed data at sender is solely limited by CongWin
bull Packet loss delay amp packet transmission delay are negligible
Sending rate (approx) CongWinRTT
By adjusting CongWin sender can therefore adjust the
rate at which it sends data into its connection
New variable ndash Congestion
Window
Transport Layer ndash TCP
39
BB
TCP Congestion ControlTCP Congestion ControlTCP uses ACKs to trigger (ldquoclockrdquo) its increase in congestion window size ndash ldquoself-clockingrdquo
Arrival of ACKs ndash indication to the sender that all is well
1 Slow Rate
bull Congestion window will be increased at a relatively slow rate
2 High rate
bull Congestion window will be increased more quickly
Transport Layer ndash TCP
40
BB
TCP Congestion ControlTCP Congestion ControlHow TCP perceives that there is congestion on the path
ldquoLoss Eventrdquo ndash when there is excessive congestion router buffers along the path overflows causing datagrams to be dropped which in turn results in a ldquoloss eventrdquo at the sender
1 Timeout
bull no ACK is received after segment loss
2 Receipt of three duplicate ACKs
bull segment loss is followed by three ACKs received at the sender
Transport Layer ndash TCP
41
BB
TCP Congestion Control details
sender limits transmission LastByteSent-LastByteAcked
cwnd
roughly
cwnd is dynamic function of perceived network congestion
How does sender perceive congestion
loss event = timeout or 3 duplicate acks
TCP sender reduces rate (cwnd) after loss event
Three mechanisms1 AIMD
2 slow start
3 conservative after timeout events
rate = cwnd
RTT Bytessec
Transport Layer ndash TCP
42
BB
TCP congestion avoidance congestion avoidance additive increase multiplicative decreaseadditive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every RTT until
loss is detected multiplicative decrease cut cwnd in half after loss
timecwnd
co
nge
stio
n w
indo
w s
ize
saw toothbehavior probingfor bandwidth
Transport Layer ndash TCP
43
BB
TCP Slow Start Slow Start when connection begins
increase rate exponentially until first loss event initially cwndcwnd = 1 MSS double cwndcwnd every RTT done by incrementing cwndcwnd by 1 MSS by 1 MSS for every ACK received
summary initial rate is slow but ramps up exponentially fast (doubling of the sending rate every RTT)
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transport Layer ndash TCP
44
BB
Refinement inferring lossRefinement inferring loss
after 3 dup ACKsafter 3 dup ACKs cwndcwnd is cut in halfhalf window then grows
linearlylinearly butbut after timeout after timeout
eventevent cwndcwnd is is set to 1 MSS1 MSS window then grows
exponentiallyexponentially Up to a thresholdUp to a threshold
then grows linearlylinearly
3 dup ACKs 3 dup ACKs indicates network capable of delivering some segments
timeouttimeout indicates a ldquomore alarmingrdquo congestion scenario
Philosophy
Transport Layer ndash TCP
45
BB
Refinement
Q when should the exponential increase switch to linear
A when cwndcwnd gets to 1212 of its value before timeout
Implementation variable ssthresh (slow-start threshold)ssthresh (slow-start threshold) on loss eventon loss event ssthreshssthresh is set to 1212 of cwndcwnd just
before loss event
Transport Layer ndash TCP
46
BB
TCP Sender Congestion ControlSTATE EVENT TCP SENDER Congestion-
Control ActionCommentary
SLOW START (SS)
ACK receipt for previously unACKed data
CongWin = CongWin + MSSIf(CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo
Resulting in a doubling of CongWin every RTT
Congestion Avoidance (CA)
ACK receipt for previously unACKed data
CongWin = CongWin + MSS (MSSCongWin)
Additive increase resulting in increasing of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate ACK
Threshold = CongWin 2CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
SS or CA Timeout Threshold = CongWin 2CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter Slow Start
SS or CA Duplicate ACK Increment duplicate ACK count for segment being ACKed
CongWin and Threshold not changed
Transport Layer ndash TCP
47
BB
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3 MSSretransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++
duplicate ACK
cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
Transport Layer ndash TCP
48
BB
Congestion control
TCPrsquos Congestion Control Service
CLIENTSERVER
Problem Gridlock sets-in when there is packet loss due to router congestion
forces the End Systems to decrease the rate at which packets are sentdecrease the rate at which packets are sent during periods of congestion
The sending systemrsquos packet is lost due to congestion and is alerted when it stops receiving ACKs of packets sent
Transport Layer ndash TCP
49
BBTransport Layer 3-49
Macroscopic Description of TCP throughput
whatrsquos the average throughout of TCP as a function of window size and RTT ignore slow start (typically very short phases)
let W be the window size when loss occurs when window is W throughput is WRTT just after loss window drops to W2 throughput to
W2RTT Throughput increases linearly (by MSSRTT every
RTT) Average Throughput 75 WRTT
(Based on Idealised model for the steady-state dynamics of TCP)
Transport Layer 3-50
TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired
throughput of 10 Gbps requires window size W = 83333 in-flight
segments Throughput in terms of loss rate
L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)
new versions of TCP is needed for high-speed environments
LRTT
MSS221
Transport Layer ndash TCP
51
BB
TCP FairnessTCP Fairness
Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth
TCP connection 1
bottleneckrouter
capacity RR
TCP connection 2
Go to Summary of TCP Congestion Control
Transport Layer ndash TCP
52
BB
Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link
Link with transmission rate of RR
Each connection have the same MSSMSS RTTRTT
No other TCP connections or UDP datagrams traverse the shared link
Ignore slow start phase of TCP
Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)
Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing
AssumptionsAssumptions
Transport Layer ndash TCP
53
BB
Why is TCP fairWhy is TCP fair
Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput
proportionally
RR
equal bandwidth shareequal bandwidth share
Connection 1 throughput
Connect
ion 2
thro
ughput
congestion avoidance additive increaseloss decrease window by factor of 2
RR
A point on the graph depicts the amount of link bandwidth
jointly jointly consumedconsumed by
the connections
Full bandwidth utilisation line
We can viewWe can view a simulation a simulation
on thison this
View SimulationView Simulation
Transport Layer ndash TCP
54
BB
The End
The next succeeding slides are just for additional reading
Transport Layer ndash TCP
55
BB
TCP Latency ModelingTCP Latency Modeling
In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs
Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth
Multiple End Systems sharing a link
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
Loop holes in TCPLoop holes in TCP
1 TCP connection
1 TCP connection
1 TCP connection
3 TCP connections
Multithreading implementationMultithreading implementation
Transport Layer ndash TCP
56
BB
TCP latency modelingTCP latency modeling
TCP connection establishment time data transfer delay Actual data transmission time
Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent
QQ How long How long does it take to does it take to
receive an object receive an object from a Web from a Web
serverserver
No data transfer delayNo data transfer delay
Therersquos data transfer delayTherersquos data transfer delay
the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety
Transport Layer ndash TCP
57
BB
TCP Latency ModelingTCP Latency Modeling
Network is uncongested with one link between end systems of rate RR
CongWinCongWin (fixed) determines the amount of data that can be sent
No packet loss no packet corruption no retransmissions required
Header overheads are negligible
File to sendFile to send = integer number of segments of size MSS
Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times
CLIENT SERVER
FILEFILE
Initial ThresholdThreshold of TCP congestion mechanism is very big
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits
SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)
Transport Layer ndash TCP
58
BB
TCP latency ModelingTCP latency Modeling
Case 1 latency = 2RTT + OR
K = Number of Windows of data that cover the object
K = OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
Number of segmentsRounded up to the nearest integer
Assume W=4 segmentsAssume W=4 segments
eg O=256bits S=32bits W=4
Transport Layer ndash TCP
59
BB
TCP latency ModelingTCP latency Modeling
Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]
Number of Windows of data that cover the objectK= OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR
Sender has to wait for an ACK after a windowrsquos worth of data sent
STALLED STALLED PERIODPERIOD
If there are k windows sender willbe stalled (k-1) times
Transport Layer ndash TCP
60
BB
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
STALLED STALLED PERIODPERIOD
4 windows4 windows
OS=15
Transport Layer ndash TCP
61
BB
bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the
object as followsobject as follows
S
OkK k 110 222min
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1log
1logmin
12min
2
2
S
OK
S
OkkK
S
OkK k
Note
Transport Layer ndash TCP
62
BB
bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window
bull Transmission of Transmission of kkth window window ==
bull Stall Time Stall Time ==
bull Latency Latency ==
12
k
R
S
12
k
R
SRTT
R
S
RTTR
S
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1K
1
122k
k
R
SRTT
R
S
R
ORTT
Transport Layer ndash TCP
63
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
Transport Layer ndash TCP
64
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
R
S
R
SRTTP
R
ORTT P )12(2Latency
bull Closed-form expression for the latency
Transport Layer ndash TCP
65
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
2
1
RTTRO
P
encyMinimumLat
Latency
Slow start will not significantly increase latency if RTT ltlt OR
Transport Layer ndash TCP
66
BB
httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features
Transport Layer ndash TCP
38
BB
TCP Congestion ControlTCP Congestion ControlHow TCP sender limits the rate at which it sends traffic into its connection
SENDER
(Amount of unACKed data)SENDER lt min(CongWin RcvWindow)
LastByteSent - LastByteACKed
Indirectly limits the senderrsquos send rateAssumptions
bull TCP receive buffer is very large ndash no RcvWindow constraint Amt of unACKed data at sender is solely limited by CongWin
bull Packet loss delay amp packet transmission delay are negligible
Sending rate (approx) CongWinRTT
By adjusting CongWin sender can therefore adjust the
rate at which it sends data into its connection
New variable ndash Congestion
Window
Transport Layer ndash TCP
39
BB
TCP Congestion ControlTCP Congestion ControlTCP uses ACKs to trigger (ldquoclockrdquo) its increase in congestion window size ndash ldquoself-clockingrdquo
Arrival of ACKs ndash indication to the sender that all is well
1 Slow Rate
bull Congestion window will be increased at a relatively slow rate
2 High rate
bull Congestion window will be increased more quickly
Transport Layer ndash TCP
40
BB
TCP Congestion ControlTCP Congestion ControlHow TCP perceives that there is congestion on the path
ldquoLoss Eventrdquo ndash when there is excessive congestion router buffers along the path overflows causing datagrams to be dropped which in turn results in a ldquoloss eventrdquo at the sender
1 Timeout
bull no ACK is received after segment loss
2 Receipt of three duplicate ACKs
bull segment loss is followed by three ACKs received at the sender
Transport Layer ndash TCP
41
BB
TCP Congestion Control details
sender limits transmission LastByteSent-LastByteAcked
cwnd
roughly
cwnd is dynamic function of perceived network congestion
How does sender perceive congestion
loss event = timeout or 3 duplicate acks
TCP sender reduces rate (cwnd) after loss event
Three mechanisms1 AIMD
2 slow start
3 conservative after timeout events
rate = cwnd
RTT Bytessec
Transport Layer ndash TCP
42
BB
TCP congestion avoidance congestion avoidance additive increase multiplicative decreaseadditive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every RTT until
loss is detected multiplicative decrease cut cwnd in half after loss
timecwnd
co
nge
stio
n w
indo
w s
ize
saw toothbehavior probingfor bandwidth
Transport Layer ndash TCP
43
BB
TCP Slow Start Slow Start when connection begins
increase rate exponentially until first loss event initially cwndcwnd = 1 MSS double cwndcwnd every RTT done by incrementing cwndcwnd by 1 MSS by 1 MSS for every ACK received
summary initial rate is slow but ramps up exponentially fast (doubling of the sending rate every RTT)
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transport Layer ndash TCP
44
BB
Refinement inferring lossRefinement inferring loss
after 3 dup ACKsafter 3 dup ACKs cwndcwnd is cut in halfhalf window then grows
linearlylinearly butbut after timeout after timeout
eventevent cwndcwnd is is set to 1 MSS1 MSS window then grows
exponentiallyexponentially Up to a thresholdUp to a threshold
then grows linearlylinearly
3 dup ACKs 3 dup ACKs indicates network capable of delivering some segments
timeouttimeout indicates a ldquomore alarmingrdquo congestion scenario
Philosophy
Transport Layer ndash TCP
45
BB
Refinement
Q when should the exponential increase switch to linear
A when cwndcwnd gets to 1212 of its value before timeout
Implementation variable ssthresh (slow-start threshold)ssthresh (slow-start threshold) on loss eventon loss event ssthreshssthresh is set to 1212 of cwndcwnd just
before loss event
Transport Layer ndash TCP
46
BB
TCP Sender Congestion ControlSTATE EVENT TCP SENDER Congestion-
Control ActionCommentary
SLOW START (SS)
ACK receipt for previously unACKed data
CongWin = CongWin + MSSIf(CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo
Resulting in a doubling of CongWin every RTT
Congestion Avoidance (CA)
ACK receipt for previously unACKed data
CongWin = CongWin + MSS (MSSCongWin)
Additive increase resulting in increasing of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate ACK
Threshold = CongWin 2CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
SS or CA Timeout Threshold = CongWin 2CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter Slow Start
SS or CA Duplicate ACK Increment duplicate ACK count for segment being ACKed
CongWin and Threshold not changed
Transport Layer ndash TCP
47
BB
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3 MSSretransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++
duplicate ACK
cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
Transport Layer ndash TCP
48
BB
Congestion control
TCPrsquos Congestion Control Service
CLIENTSERVER
Problem Gridlock sets-in when there is packet loss due to router congestion
forces the End Systems to decrease the rate at which packets are sentdecrease the rate at which packets are sent during periods of congestion
The sending systemrsquos packet is lost due to congestion and is alerted when it stops receiving ACKs of packets sent
Transport Layer ndash TCP
49
BBTransport Layer 3-49
Macroscopic Description of TCP throughput
whatrsquos the average throughout of TCP as a function of window size and RTT ignore slow start (typically very short phases)
let W be the window size when loss occurs when window is W throughput is WRTT just after loss window drops to W2 throughput to
W2RTT Throughput increases linearly (by MSSRTT every
RTT) Average Throughput 75 WRTT
(Based on Idealised model for the steady-state dynamics of TCP)
Transport Layer 3-50
TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired
throughput of 10 Gbps requires window size W = 83333 in-flight
segments Throughput in terms of loss rate
L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)
new versions of TCP is needed for high-speed environments
LRTT
MSS221
Transport Layer ndash TCP
51
BB
TCP FairnessTCP Fairness
Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth
TCP connection 1
bottleneckrouter
capacity RR
TCP connection 2
Go to Summary of TCP Congestion Control
Transport Layer ndash TCP
52
BB
Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link
Link with transmission rate of RR
Each connection have the same MSSMSS RTTRTT
No other TCP connections or UDP datagrams traverse the shared link
Ignore slow start phase of TCP
Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)
Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing
AssumptionsAssumptions
Transport Layer ndash TCP
53
BB
Why is TCP fairWhy is TCP fair
Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput
proportionally
RR
equal bandwidth shareequal bandwidth share
Connection 1 throughput
Connect
ion 2
thro
ughput
congestion avoidance additive increaseloss decrease window by factor of 2
RR
A point on the graph depicts the amount of link bandwidth
jointly jointly consumedconsumed by
the connections
Full bandwidth utilisation line
We can viewWe can view a simulation a simulation
on thison this
View SimulationView Simulation
Transport Layer ndash TCP
54
BB
The End
The next succeeding slides are just for additional reading
Transport Layer ndash TCP
55
BB
TCP Latency ModelingTCP Latency Modeling
In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs
Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth
Multiple End Systems sharing a link
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
Loop holes in TCPLoop holes in TCP
1 TCP connection
1 TCP connection
1 TCP connection
3 TCP connections
Multithreading implementationMultithreading implementation
Transport Layer ndash TCP
56
BB
TCP latency modelingTCP latency modeling
TCP connection establishment time data transfer delay Actual data transmission time
Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent
QQ How long How long does it take to does it take to
receive an object receive an object from a Web from a Web
serverserver
No data transfer delayNo data transfer delay
Therersquos data transfer delayTherersquos data transfer delay
the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety
Transport Layer ndash TCP
57
BB
TCP Latency ModelingTCP Latency Modeling
Network is uncongested with one link between end systems of rate RR
CongWinCongWin (fixed) determines the amount of data that can be sent
No packet loss no packet corruption no retransmissions required
Header overheads are negligible
File to sendFile to send = integer number of segments of size MSS
Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times
CLIENT SERVER
FILEFILE
Initial ThresholdThreshold of TCP congestion mechanism is very big
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits
SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)
Transport Layer ndash TCP
58
BB
TCP latency ModelingTCP latency Modeling
Case 1 latency = 2RTT + OR
K = Number of Windows of data that cover the object
K = OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
Number of segmentsRounded up to the nearest integer
Assume W=4 segmentsAssume W=4 segments
eg O=256bits S=32bits W=4
Transport Layer ndash TCP
59
BB
TCP latency ModelingTCP latency Modeling
Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]
Number of Windows of data that cover the objectK= OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR
Sender has to wait for an ACK after a windowrsquos worth of data sent
STALLED STALLED PERIODPERIOD
If there are k windows sender willbe stalled (k-1) times
Transport Layer ndash TCP
60
BB
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
STALLED STALLED PERIODPERIOD
4 windows4 windows
OS=15
Transport Layer ndash TCP
61
BB
bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the
object as followsobject as follows
S
OkK k 110 222min
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1log
1logmin
12min
2
2
S
OK
S
OkkK
S
OkK k
Note
Transport Layer ndash TCP
62
BB
bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window
bull Transmission of Transmission of kkth window window ==
bull Stall Time Stall Time ==
bull Latency Latency ==
12
k
R
S
12
k
R
SRTT
R
S
RTTR
S
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1K
1
122k
k
R
SRTT
R
S
R
ORTT
Transport Layer ndash TCP
63
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
Transport Layer ndash TCP
64
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
R
S
R
SRTTP
R
ORTT P )12(2Latency
bull Closed-form expression for the latency
Transport Layer ndash TCP
65
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
2
1
RTTRO
P
encyMinimumLat
Latency
Slow start will not significantly increase latency if RTT ltlt OR
Transport Layer ndash TCP
66
BB
httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features
Transport Layer ndash TCP
39
BB
TCP Congestion ControlTCP Congestion ControlTCP uses ACKs to trigger (ldquoclockrdquo) its increase in congestion window size ndash ldquoself-clockingrdquo
Arrival of ACKs ndash indication to the sender that all is well
1 Slow Rate
bull Congestion window will be increased at a relatively slow rate
2 High rate
bull Congestion window will be increased more quickly
Transport Layer ndash TCP
40
BB
TCP Congestion ControlTCP Congestion ControlHow TCP perceives that there is congestion on the path
ldquoLoss Eventrdquo ndash when there is excessive congestion router buffers along the path overflows causing datagrams to be dropped which in turn results in a ldquoloss eventrdquo at the sender
1 Timeout
bull no ACK is received after segment loss
2 Receipt of three duplicate ACKs
bull segment loss is followed by three ACKs received at the sender
Transport Layer ndash TCP
41
BB
TCP Congestion Control details
sender limits transmission LastByteSent-LastByteAcked
cwnd
roughly
cwnd is dynamic function of perceived network congestion
How does sender perceive congestion
loss event = timeout or 3 duplicate acks
TCP sender reduces rate (cwnd) after loss event
Three mechanisms1 AIMD
2 slow start
3 conservative after timeout events
rate = cwnd
RTT Bytessec
Transport Layer ndash TCP
42
BB
TCP congestion avoidance congestion avoidance additive increase multiplicative decreaseadditive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every RTT until
loss is detected multiplicative decrease cut cwnd in half after loss
timecwnd
co
nge
stio
n w
indo
w s
ize
saw toothbehavior probingfor bandwidth
Transport Layer ndash TCP
43
BB
TCP Slow Start Slow Start when connection begins
increase rate exponentially until first loss event initially cwndcwnd = 1 MSS double cwndcwnd every RTT done by incrementing cwndcwnd by 1 MSS by 1 MSS for every ACK received
summary initial rate is slow but ramps up exponentially fast (doubling of the sending rate every RTT)
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transport Layer ndash TCP
44
BB
Refinement inferring lossRefinement inferring loss
after 3 dup ACKsafter 3 dup ACKs cwndcwnd is cut in halfhalf window then grows
linearlylinearly butbut after timeout after timeout
eventevent cwndcwnd is is set to 1 MSS1 MSS window then grows
exponentiallyexponentially Up to a thresholdUp to a threshold
then grows linearlylinearly
3 dup ACKs 3 dup ACKs indicates network capable of delivering some segments
timeouttimeout indicates a ldquomore alarmingrdquo congestion scenario
Philosophy
Transport Layer ndash TCP
45
BB
Refinement
Q when should the exponential increase switch to linear
A when cwndcwnd gets to 1212 of its value before timeout
Implementation variable ssthresh (slow-start threshold)ssthresh (slow-start threshold) on loss eventon loss event ssthreshssthresh is set to 1212 of cwndcwnd just
before loss event
Transport Layer ndash TCP
46
BB
TCP Sender Congestion ControlSTATE EVENT TCP SENDER Congestion-
Control ActionCommentary
SLOW START (SS)
ACK receipt for previously unACKed data
CongWin = CongWin + MSSIf(CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo
Resulting in a doubling of CongWin every RTT
Congestion Avoidance (CA)
ACK receipt for previously unACKed data
CongWin = CongWin + MSS (MSSCongWin)
Additive increase resulting in increasing of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate ACK
Threshold = CongWin 2CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
SS or CA Timeout Threshold = CongWin 2CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter Slow Start
SS or CA Duplicate ACK Increment duplicate ACK count for segment being ACKed
CongWin and Threshold not changed
Transport Layer ndash TCP
47
BB
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3 MSSretransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++
duplicate ACK
cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
Transport Layer ndash TCP
48
BB
Congestion control
TCPrsquos Congestion Control Service
CLIENTSERVER
Problem Gridlock sets-in when there is packet loss due to router congestion
forces the End Systems to decrease the rate at which packets are sentdecrease the rate at which packets are sent during periods of congestion
The sending systemrsquos packet is lost due to congestion and is alerted when it stops receiving ACKs of packets sent
Transport Layer ndash TCP
49
BBTransport Layer 3-49
Macroscopic Description of TCP throughput
whatrsquos the average throughout of TCP as a function of window size and RTT ignore slow start (typically very short phases)
let W be the window size when loss occurs when window is W throughput is WRTT just after loss window drops to W2 throughput to
W2RTT Throughput increases linearly (by MSSRTT every
RTT) Average Throughput 75 WRTT
(Based on Idealised model for the steady-state dynamics of TCP)
Transport Layer 3-50
TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired
throughput of 10 Gbps requires window size W = 83333 in-flight
segments Throughput in terms of loss rate
L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)
new versions of TCP is needed for high-speed environments
LRTT
MSS221
Transport Layer ndash TCP
51
BB
TCP FairnessTCP Fairness
Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth
TCP connection 1
bottleneckrouter
capacity RR
TCP connection 2
Go to Summary of TCP Congestion Control
Transport Layer ndash TCP
52
BB
Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link
Link with transmission rate of RR
Each connection have the same MSSMSS RTTRTT
No other TCP connections or UDP datagrams traverse the shared link
Ignore slow start phase of TCP
Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)
Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing
AssumptionsAssumptions
Transport Layer ndash TCP
53
BB
Why is TCP fairWhy is TCP fair
Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput
proportionally
RR
equal bandwidth shareequal bandwidth share
Connection 1 throughput
Connect
ion 2
thro
ughput
congestion avoidance additive increaseloss decrease window by factor of 2
RR
A point on the graph depicts the amount of link bandwidth
jointly jointly consumedconsumed by
the connections
Full bandwidth utilisation line
We can viewWe can view a simulation a simulation
on thison this
View SimulationView Simulation
Transport Layer ndash TCP
54
BB
The End
The next succeeding slides are just for additional reading
Transport Layer ndash TCP
55
BB
TCP Latency ModelingTCP Latency Modeling
In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs
Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth
Multiple End Systems sharing a link
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
Loop holes in TCPLoop holes in TCP
1 TCP connection
1 TCP connection
1 TCP connection
3 TCP connections
Multithreading implementationMultithreading implementation
Transport Layer ndash TCP
56
BB
TCP latency modelingTCP latency modeling
TCP connection establishment time data transfer delay Actual data transmission time
Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent
QQ How long How long does it take to does it take to
receive an object receive an object from a Web from a Web
serverserver
No data transfer delayNo data transfer delay
Therersquos data transfer delayTherersquos data transfer delay
the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety
Transport Layer ndash TCP
57
BB
TCP Latency ModelingTCP Latency Modeling
Network is uncongested with one link between end systems of rate RR
CongWinCongWin (fixed) determines the amount of data that can be sent
No packet loss no packet corruption no retransmissions required
Header overheads are negligible
File to sendFile to send = integer number of segments of size MSS
Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times
CLIENT SERVER
FILEFILE
Initial ThresholdThreshold of TCP congestion mechanism is very big
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits
SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)
Transport Layer ndash TCP
58
BB
TCP latency ModelingTCP latency Modeling
Case 1 latency = 2RTT + OR
K = Number of Windows of data that cover the object
K = OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
Number of segmentsRounded up to the nearest integer
Assume W=4 segmentsAssume W=4 segments
eg O=256bits S=32bits W=4
Transport Layer ndash TCP
59
BB
TCP latency ModelingTCP latency Modeling
Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]
Number of Windows of data that cover the objectK= OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR
Sender has to wait for an ACK after a windowrsquos worth of data sent
STALLED STALLED PERIODPERIOD
If there are k windows sender willbe stalled (k-1) times
Transport Layer ndash TCP
60
BB
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
STALLED STALLED PERIODPERIOD
4 windows4 windows
OS=15
Transport Layer ndash TCP
61
BB
bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the
object as followsobject as follows
S
OkK k 110 222min
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1log
1logmin
12min
2
2
S
OK
S
OkkK
S
OkK k
Note
Transport Layer ndash TCP
62
BB
bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window
bull Transmission of Transmission of kkth window window ==
bull Stall Time Stall Time ==
bull Latency Latency ==
12
k
R
S
12
k
R
SRTT
R
S
RTTR
S
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1K
1
122k
k
R
SRTT
R
S
R
ORTT
Transport Layer ndash TCP
63
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
Transport Layer ndash TCP
64
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
R
S
R
SRTTP
R
ORTT P )12(2Latency
bull Closed-form expression for the latency
Transport Layer ndash TCP
65
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
2
1
RTTRO
P
encyMinimumLat
Latency
Slow start will not significantly increase latency if RTT ltlt OR
Transport Layer ndash TCP
66
BB
httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features
Transport Layer ndash TCP
40
BB
TCP Congestion ControlTCP Congestion ControlHow TCP perceives that there is congestion on the path
ldquoLoss Eventrdquo ndash when there is excessive congestion router buffers along the path overflows causing datagrams to be dropped which in turn results in a ldquoloss eventrdquo at the sender
1 Timeout
bull no ACK is received after segment loss
2 Receipt of three duplicate ACKs
bull segment loss is followed by three ACKs received at the sender
Transport Layer ndash TCP
41
BB
TCP Congestion Control details
sender limits transmission LastByteSent-LastByteAcked
cwnd
roughly
cwnd is dynamic function of perceived network congestion
How does sender perceive congestion
loss event = timeout or 3 duplicate acks
TCP sender reduces rate (cwnd) after loss event
Three mechanisms1 AIMD
2 slow start
3 conservative after timeout events
rate = cwnd
RTT Bytessec
Transport Layer ndash TCP
42
BB
TCP congestion avoidance congestion avoidance additive increase multiplicative decreaseadditive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every RTT until
loss is detected multiplicative decrease cut cwnd in half after loss
timecwnd
co
nge
stio
n w
indo
w s
ize
saw toothbehavior probingfor bandwidth
Transport Layer ndash TCP
43
BB
TCP Slow Start Slow Start when connection begins
increase rate exponentially until first loss event initially cwndcwnd = 1 MSS double cwndcwnd every RTT done by incrementing cwndcwnd by 1 MSS by 1 MSS for every ACK received
summary initial rate is slow but ramps up exponentially fast (doubling of the sending rate every RTT)
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transport Layer ndash TCP
44
BB
Refinement inferring lossRefinement inferring loss
after 3 dup ACKsafter 3 dup ACKs cwndcwnd is cut in halfhalf window then grows
linearlylinearly butbut after timeout after timeout
eventevent cwndcwnd is is set to 1 MSS1 MSS window then grows
exponentiallyexponentially Up to a thresholdUp to a threshold
then grows linearlylinearly
3 dup ACKs 3 dup ACKs indicates network capable of delivering some segments
timeouttimeout indicates a ldquomore alarmingrdquo congestion scenario
Philosophy
Transport Layer ndash TCP
45
BB
Refinement
Q when should the exponential increase switch to linear
A when cwndcwnd gets to 1212 of its value before timeout
Implementation variable ssthresh (slow-start threshold)ssthresh (slow-start threshold) on loss eventon loss event ssthreshssthresh is set to 1212 of cwndcwnd just
before loss event
Transport Layer ndash TCP
46
BB
TCP Sender Congestion ControlSTATE EVENT TCP SENDER Congestion-
Control ActionCommentary
SLOW START (SS)
ACK receipt for previously unACKed data
CongWin = CongWin + MSSIf(CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo
Resulting in a doubling of CongWin every RTT
Congestion Avoidance (CA)
ACK receipt for previously unACKed data
CongWin = CongWin + MSS (MSSCongWin)
Additive increase resulting in increasing of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate ACK
Threshold = CongWin 2CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
SS or CA Timeout Threshold = CongWin 2CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter Slow Start
SS or CA Duplicate ACK Increment duplicate ACK count for segment being ACKed
CongWin and Threshold not changed
Transport Layer ndash TCP
47
BB
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3 MSSretransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++
duplicate ACK
cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
Transport Layer ndash TCP
48
BB
Congestion control
TCPrsquos Congestion Control Service
CLIENTSERVER
Problem Gridlock sets-in when there is packet loss due to router congestion
forces the End Systems to decrease the rate at which packets are sentdecrease the rate at which packets are sent during periods of congestion
The sending systemrsquos packet is lost due to congestion and is alerted when it stops receiving ACKs of packets sent
Transport Layer ndash TCP
49
BBTransport Layer 3-49
Macroscopic Description of TCP throughput
whatrsquos the average throughout of TCP as a function of window size and RTT ignore slow start (typically very short phases)
let W be the window size when loss occurs when window is W throughput is WRTT just after loss window drops to W2 throughput to
W2RTT Throughput increases linearly (by MSSRTT every
RTT) Average Throughput 75 WRTT
(Based on Idealised model for the steady-state dynamics of TCP)
Transport Layer 3-50
TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired
throughput of 10 Gbps requires window size W = 83333 in-flight
segments Throughput in terms of loss rate
L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)
new versions of TCP is needed for high-speed environments
LRTT
MSS221
Transport Layer ndash TCP
51
BB
TCP FairnessTCP Fairness
Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth
TCP connection 1
bottleneckrouter
capacity RR
TCP connection 2
Go to Summary of TCP Congestion Control
Transport Layer ndash TCP
52
BB
Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link
Link with transmission rate of RR
Each connection have the same MSSMSS RTTRTT
No other TCP connections or UDP datagrams traverse the shared link
Ignore slow start phase of TCP
Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)
Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing
AssumptionsAssumptions
Transport Layer ndash TCP
53
BB
Why is TCP fairWhy is TCP fair
Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput
proportionally
RR
equal bandwidth shareequal bandwidth share
Connection 1 throughput
Connect
ion 2
thro
ughput
congestion avoidance additive increaseloss decrease window by factor of 2
RR
A point on the graph depicts the amount of link bandwidth
jointly jointly consumedconsumed by
the connections
Full bandwidth utilisation line
We can viewWe can view a simulation a simulation
on thison this
View SimulationView Simulation
Transport Layer ndash TCP
54
BB
The End
The next succeeding slides are just for additional reading
Transport Layer ndash TCP
55
BB
TCP Latency ModelingTCP Latency Modeling
In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs
Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth
Multiple End Systems sharing a link
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
Loop holes in TCPLoop holes in TCP
1 TCP connection
1 TCP connection
1 TCP connection
3 TCP connections
Multithreading implementationMultithreading implementation
Transport Layer ndash TCP
56
BB
TCP latency modelingTCP latency modeling
TCP connection establishment time data transfer delay Actual data transmission time
Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent
QQ How long How long does it take to does it take to
receive an object receive an object from a Web from a Web
serverserver
No data transfer delayNo data transfer delay
Therersquos data transfer delayTherersquos data transfer delay
the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety
Transport Layer ndash TCP
57
BB
TCP Latency ModelingTCP Latency Modeling
Network is uncongested with one link between end systems of rate RR
CongWinCongWin (fixed) determines the amount of data that can be sent
No packet loss no packet corruption no retransmissions required
Header overheads are negligible
File to sendFile to send = integer number of segments of size MSS
Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times
CLIENT SERVER
FILEFILE
Initial ThresholdThreshold of TCP congestion mechanism is very big
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits
SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)
Transport Layer ndash TCP
58
BB
TCP latency ModelingTCP latency Modeling
Case 1 latency = 2RTT + OR
K = Number of Windows of data that cover the object
K = OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
Number of segmentsRounded up to the nearest integer
Assume W=4 segmentsAssume W=4 segments
eg O=256bits S=32bits W=4
Transport Layer ndash TCP
59
BB
TCP latency ModelingTCP latency Modeling
Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]
Number of Windows of data that cover the objectK= OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR
Sender has to wait for an ACK after a windowrsquos worth of data sent
STALLED STALLED PERIODPERIOD
If there are k windows sender willbe stalled (k-1) times
Transport Layer ndash TCP
60
BB
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
STALLED STALLED PERIODPERIOD
4 windows4 windows
OS=15
Transport Layer ndash TCP
61
BB
bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the
object as followsobject as follows
S
OkK k 110 222min
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1log
1logmin
12min
2
2
S
OK
S
OkkK
S
OkK k
Note
Transport Layer ndash TCP
62
BB
bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window
bull Transmission of Transmission of kkth window window ==
bull Stall Time Stall Time ==
bull Latency Latency ==
12
k
R
S
12
k
R
SRTT
R
S
RTTR
S
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1K
1
122k
k
R
SRTT
R
S
R
ORTT
Transport Layer ndash TCP
63
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
Transport Layer ndash TCP
64
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
R
S
R
SRTTP
R
ORTT P )12(2Latency
bull Closed-form expression for the latency
Transport Layer ndash TCP
65
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
2
1
RTTRO
P
encyMinimumLat
Latency
Slow start will not significantly increase latency if RTT ltlt OR
Transport Layer ndash TCP
66
BB
httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features
Transport Layer ndash TCP
41
BB
TCP Congestion Control details
sender limits transmission LastByteSent-LastByteAcked
cwnd
roughly
cwnd is dynamic function of perceived network congestion
How does sender perceive congestion
loss event = timeout or 3 duplicate acks
TCP sender reduces rate (cwnd) after loss event
Three mechanisms1 AIMD
2 slow start
3 conservative after timeout events
rate = cwnd
RTT Bytessec
Transport Layer ndash TCP
42
BB
TCP congestion avoidance congestion avoidance additive increase multiplicative decreaseadditive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every RTT until
loss is detected multiplicative decrease cut cwnd in half after loss
timecwnd
co
nge
stio
n w
indo
w s
ize
saw toothbehavior probingfor bandwidth
Transport Layer ndash TCP
43
BB
TCP Slow Start Slow Start when connection begins
increase rate exponentially until first loss event initially cwndcwnd = 1 MSS double cwndcwnd every RTT done by incrementing cwndcwnd by 1 MSS by 1 MSS for every ACK received
summary initial rate is slow but ramps up exponentially fast (doubling of the sending rate every RTT)
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transport Layer ndash TCP
44
BB
Refinement inferring lossRefinement inferring loss
after 3 dup ACKsafter 3 dup ACKs cwndcwnd is cut in halfhalf window then grows
linearlylinearly butbut after timeout after timeout
eventevent cwndcwnd is is set to 1 MSS1 MSS window then grows
exponentiallyexponentially Up to a thresholdUp to a threshold
then grows linearlylinearly
3 dup ACKs 3 dup ACKs indicates network capable of delivering some segments
timeouttimeout indicates a ldquomore alarmingrdquo congestion scenario
Philosophy
Transport Layer ndash TCP
45
BB
Refinement
Q when should the exponential increase switch to linear
A when cwndcwnd gets to 1212 of its value before timeout
Implementation variable ssthresh (slow-start threshold)ssthresh (slow-start threshold) on loss eventon loss event ssthreshssthresh is set to 1212 of cwndcwnd just
before loss event
Transport Layer ndash TCP
46
BB
TCP Sender Congestion ControlSTATE EVENT TCP SENDER Congestion-
Control ActionCommentary
SLOW START (SS)
ACK receipt for previously unACKed data
CongWin = CongWin + MSSIf(CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo
Resulting in a doubling of CongWin every RTT
Congestion Avoidance (CA)
ACK receipt for previously unACKed data
CongWin = CongWin + MSS (MSSCongWin)
Additive increase resulting in increasing of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate ACK
Threshold = CongWin 2CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
SS or CA Timeout Threshold = CongWin 2CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter Slow Start
SS or CA Duplicate ACK Increment duplicate ACK count for segment being ACKed
CongWin and Threshold not changed
Transport Layer ndash TCP
47
BB
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3 MSSretransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++
duplicate ACK
cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
Transport Layer ndash TCP
48
BB
Congestion control
TCPrsquos Congestion Control Service
CLIENTSERVER
Problem Gridlock sets-in when there is packet loss due to router congestion
forces the End Systems to decrease the rate at which packets are sentdecrease the rate at which packets are sent during periods of congestion
The sending systemrsquos packet is lost due to congestion and is alerted when it stops receiving ACKs of packets sent
Transport Layer ndash TCP
49
BBTransport Layer 3-49
Macroscopic Description of TCP throughput
whatrsquos the average throughout of TCP as a function of window size and RTT ignore slow start (typically very short phases)
let W be the window size when loss occurs when window is W throughput is WRTT just after loss window drops to W2 throughput to
W2RTT Throughput increases linearly (by MSSRTT every
RTT) Average Throughput 75 WRTT
(Based on Idealised model for the steady-state dynamics of TCP)
Transport Layer 3-50
TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired
throughput of 10 Gbps requires window size W = 83333 in-flight
segments Throughput in terms of loss rate
L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)
new versions of TCP is needed for high-speed environments
LRTT
MSS221
Transport Layer ndash TCP
51
BB
TCP FairnessTCP Fairness
Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth
TCP connection 1
bottleneckrouter
capacity RR
TCP connection 2
Go to Summary of TCP Congestion Control
Transport Layer ndash TCP
52
BB
Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link
Link with transmission rate of RR
Each connection have the same MSSMSS RTTRTT
No other TCP connections or UDP datagrams traverse the shared link
Ignore slow start phase of TCP
Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)
Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing
AssumptionsAssumptions
Transport Layer ndash TCP
53
BB
Why is TCP fairWhy is TCP fair
Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput
proportionally
RR
equal bandwidth shareequal bandwidth share
Connection 1 throughput
Connect
ion 2
thro
ughput
congestion avoidance additive increaseloss decrease window by factor of 2
RR
A point on the graph depicts the amount of link bandwidth
jointly jointly consumedconsumed by
the connections
Full bandwidth utilisation line
We can viewWe can view a simulation a simulation
on thison this
View SimulationView Simulation
Transport Layer ndash TCP
54
BB
The End
The next succeeding slides are just for additional reading
Transport Layer ndash TCP
55
BB
TCP Latency ModelingTCP Latency Modeling
In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs
Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth
Multiple End Systems sharing a link
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
Loop holes in TCPLoop holes in TCP
1 TCP connection
1 TCP connection
1 TCP connection
3 TCP connections
Multithreading implementationMultithreading implementation
Transport Layer ndash TCP
56
BB
TCP latency modelingTCP latency modeling
TCP connection establishment time data transfer delay Actual data transmission time
Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent
QQ How long How long does it take to does it take to
receive an object receive an object from a Web from a Web
serverserver
No data transfer delayNo data transfer delay
Therersquos data transfer delayTherersquos data transfer delay
the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety
Transport Layer ndash TCP
57
BB
TCP Latency ModelingTCP Latency Modeling
Network is uncongested with one link between end systems of rate RR
CongWinCongWin (fixed) determines the amount of data that can be sent
No packet loss no packet corruption no retransmissions required
Header overheads are negligible
File to sendFile to send = integer number of segments of size MSS
Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times
CLIENT SERVER
FILEFILE
Initial ThresholdThreshold of TCP congestion mechanism is very big
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits
SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)
Transport Layer ndash TCP
58
BB
TCP latency ModelingTCP latency Modeling
Case 1 latency = 2RTT + OR
K = Number of Windows of data that cover the object
K = OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
Number of segmentsRounded up to the nearest integer
Assume W=4 segmentsAssume W=4 segments
eg O=256bits S=32bits W=4
Transport Layer ndash TCP
59
BB
TCP latency ModelingTCP latency Modeling
Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]
Number of Windows of data that cover the objectK= OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR
Sender has to wait for an ACK after a windowrsquos worth of data sent
STALLED STALLED PERIODPERIOD
If there are k windows sender willbe stalled (k-1) times
Transport Layer ndash TCP
60
BB
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
STALLED STALLED PERIODPERIOD
4 windows4 windows
OS=15
Transport Layer ndash TCP
61
BB
bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the
object as followsobject as follows
S
OkK k 110 222min
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1log
1logmin
12min
2
2
S
OK
S
OkkK
S
OkK k
Note
Transport Layer ndash TCP
62
BB
bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window
bull Transmission of Transmission of kkth window window ==
bull Stall Time Stall Time ==
bull Latency Latency ==
12
k
R
S
12
k
R
SRTT
R
S
RTTR
S
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1K
1
122k
k
R
SRTT
R
S
R
ORTT
Transport Layer ndash TCP
63
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
Transport Layer ndash TCP
64
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
R
S
R
SRTTP
R
ORTT P )12(2Latency
bull Closed-form expression for the latency
Transport Layer ndash TCP
65
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
2
1
RTTRO
P
encyMinimumLat
Latency
Slow start will not significantly increase latency if RTT ltlt OR
Transport Layer ndash TCP
66
BB
httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features
Transport Layer ndash TCP
42
BB
TCP congestion avoidance congestion avoidance additive increase multiplicative decreaseadditive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every RTT until
loss is detected multiplicative decrease cut cwnd in half after loss
timecwnd
co
nge
stio
n w
indo
w s
ize
saw toothbehavior probingfor bandwidth
Transport Layer ndash TCP
43
BB
TCP Slow Start Slow Start when connection begins
increase rate exponentially until first loss event initially cwndcwnd = 1 MSS double cwndcwnd every RTT done by incrementing cwndcwnd by 1 MSS by 1 MSS for every ACK received
summary initial rate is slow but ramps up exponentially fast (doubling of the sending rate every RTT)
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transport Layer ndash TCP
44
BB
Refinement inferring lossRefinement inferring loss
after 3 dup ACKsafter 3 dup ACKs cwndcwnd is cut in halfhalf window then grows
linearlylinearly butbut after timeout after timeout
eventevent cwndcwnd is is set to 1 MSS1 MSS window then grows
exponentiallyexponentially Up to a thresholdUp to a threshold
then grows linearlylinearly
3 dup ACKs 3 dup ACKs indicates network capable of delivering some segments
timeouttimeout indicates a ldquomore alarmingrdquo congestion scenario
Philosophy
Transport Layer ndash TCP
45
BB
Refinement
Q when should the exponential increase switch to linear
A when cwndcwnd gets to 1212 of its value before timeout
Implementation variable ssthresh (slow-start threshold)ssthresh (slow-start threshold) on loss eventon loss event ssthreshssthresh is set to 1212 of cwndcwnd just
before loss event
Transport Layer ndash TCP
46
BB
TCP Sender Congestion ControlSTATE EVENT TCP SENDER Congestion-
Control ActionCommentary
SLOW START (SS)
ACK receipt for previously unACKed data
CongWin = CongWin + MSSIf(CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo
Resulting in a doubling of CongWin every RTT
Congestion Avoidance (CA)
ACK receipt for previously unACKed data
CongWin = CongWin + MSS (MSSCongWin)
Additive increase resulting in increasing of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate ACK
Threshold = CongWin 2CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
SS or CA Timeout Threshold = CongWin 2CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter Slow Start
SS or CA Duplicate ACK Increment duplicate ACK count for segment being ACKed
CongWin and Threshold not changed
Transport Layer ndash TCP
47
BB
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3 MSSretransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++
duplicate ACK
cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
Transport Layer ndash TCP
48
BB
Congestion control
TCPrsquos Congestion Control Service
CLIENTSERVER
Problem Gridlock sets-in when there is packet loss due to router congestion
forces the End Systems to decrease the rate at which packets are sentdecrease the rate at which packets are sent during periods of congestion
The sending systemrsquos packet is lost due to congestion and is alerted when it stops receiving ACKs of packets sent
Transport Layer ndash TCP
49
BBTransport Layer 3-49
Macroscopic Description of TCP throughput
whatrsquos the average throughout of TCP as a function of window size and RTT ignore slow start (typically very short phases)
let W be the window size when loss occurs when window is W throughput is WRTT just after loss window drops to W2 throughput to
W2RTT Throughput increases linearly (by MSSRTT every
RTT) Average Throughput 75 WRTT
(Based on Idealised model for the steady-state dynamics of TCP)
Transport Layer 3-50
TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired
throughput of 10 Gbps requires window size W = 83333 in-flight
segments Throughput in terms of loss rate
L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)
new versions of TCP is needed for high-speed environments
LRTT
MSS221
Transport Layer ndash TCP
51
BB
TCP FairnessTCP Fairness
Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth
TCP connection 1
bottleneckrouter
capacity RR
TCP connection 2
Go to Summary of TCP Congestion Control
Transport Layer ndash TCP
52
BB
Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link
Link with transmission rate of RR
Each connection have the same MSSMSS RTTRTT
No other TCP connections or UDP datagrams traverse the shared link
Ignore slow start phase of TCP
Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)
Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing
AssumptionsAssumptions
Transport Layer ndash TCP
53
BB
Why is TCP fairWhy is TCP fair
Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput
proportionally
RR
equal bandwidth shareequal bandwidth share
Connection 1 throughput
Connect
ion 2
thro
ughput
congestion avoidance additive increaseloss decrease window by factor of 2
RR
A point on the graph depicts the amount of link bandwidth
jointly jointly consumedconsumed by
the connections
Full bandwidth utilisation line
We can viewWe can view a simulation a simulation
on thison this
View SimulationView Simulation
Transport Layer ndash TCP
54
BB
The End
The next succeeding slides are just for additional reading
Transport Layer ndash TCP
55
BB
TCP Latency ModelingTCP Latency Modeling
In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs
Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth
Multiple End Systems sharing a link
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
Loop holes in TCPLoop holes in TCP
1 TCP connection
1 TCP connection
1 TCP connection
3 TCP connections
Multithreading implementationMultithreading implementation
Transport Layer ndash TCP
56
BB
TCP latency modelingTCP latency modeling
TCP connection establishment time data transfer delay Actual data transmission time
Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent
QQ How long How long does it take to does it take to
receive an object receive an object from a Web from a Web
serverserver
No data transfer delayNo data transfer delay
Therersquos data transfer delayTherersquos data transfer delay
the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety
Transport Layer ndash TCP
57
BB
TCP Latency ModelingTCP Latency Modeling
Network is uncongested with one link between end systems of rate RR
CongWinCongWin (fixed) determines the amount of data that can be sent
No packet loss no packet corruption no retransmissions required
Header overheads are negligible
File to sendFile to send = integer number of segments of size MSS
Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times
CLIENT SERVER
FILEFILE
Initial ThresholdThreshold of TCP congestion mechanism is very big
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits
SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)
Transport Layer ndash TCP
58
BB
TCP latency ModelingTCP latency Modeling
Case 1 latency = 2RTT + OR
K = Number of Windows of data that cover the object
K = OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
Number of segmentsRounded up to the nearest integer
Assume W=4 segmentsAssume W=4 segments
eg O=256bits S=32bits W=4
Transport Layer ndash TCP
59
BB
TCP latency ModelingTCP latency Modeling
Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]
Number of Windows of data that cover the objectK= OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR
Sender has to wait for an ACK after a windowrsquos worth of data sent
STALLED STALLED PERIODPERIOD
If there are k windows sender willbe stalled (k-1) times
Transport Layer ndash TCP
60
BB
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
STALLED STALLED PERIODPERIOD
4 windows4 windows
OS=15
Transport Layer ndash TCP
61
BB
bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the
object as followsobject as follows
S
OkK k 110 222min
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1log
1logmin
12min
2
2
S
OK
S
OkkK
S
OkK k
Note
Transport Layer ndash TCP
62
BB
bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window
bull Transmission of Transmission of kkth window window ==
bull Stall Time Stall Time ==
bull Latency Latency ==
12
k
R
S
12
k
R
SRTT
R
S
RTTR
S
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1K
1
122k
k
R
SRTT
R
S
R
ORTT
Transport Layer ndash TCP
63
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
Transport Layer ndash TCP
64
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
R
S
R
SRTTP
R
ORTT P )12(2Latency
bull Closed-form expression for the latency
Transport Layer ndash TCP
65
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
2
1
RTTRO
P
encyMinimumLat
Latency
Slow start will not significantly increase latency if RTT ltlt OR
Transport Layer ndash TCP
66
BB
httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features
Transport Layer ndash TCP
43
BB
TCP Slow Start Slow Start when connection begins
increase rate exponentially until first loss event initially cwndcwnd = 1 MSS double cwndcwnd every RTT done by incrementing cwndcwnd by 1 MSS by 1 MSS for every ACK received
summary initial rate is slow but ramps up exponentially fast (doubling of the sending rate every RTT)
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transport Layer ndash TCP
44
BB
Refinement inferring lossRefinement inferring loss
after 3 dup ACKsafter 3 dup ACKs cwndcwnd is cut in halfhalf window then grows
linearlylinearly butbut after timeout after timeout
eventevent cwndcwnd is is set to 1 MSS1 MSS window then grows
exponentiallyexponentially Up to a thresholdUp to a threshold
then grows linearlylinearly
3 dup ACKs 3 dup ACKs indicates network capable of delivering some segments
timeouttimeout indicates a ldquomore alarmingrdquo congestion scenario
Philosophy
Transport Layer ndash TCP
45
BB
Refinement
Q when should the exponential increase switch to linear
A when cwndcwnd gets to 1212 of its value before timeout
Implementation variable ssthresh (slow-start threshold)ssthresh (slow-start threshold) on loss eventon loss event ssthreshssthresh is set to 1212 of cwndcwnd just
before loss event
Transport Layer ndash TCP
46
BB
TCP Sender Congestion ControlSTATE EVENT TCP SENDER Congestion-
Control ActionCommentary
SLOW START (SS)
ACK receipt for previously unACKed data
CongWin = CongWin + MSSIf(CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo
Resulting in a doubling of CongWin every RTT
Congestion Avoidance (CA)
ACK receipt for previously unACKed data
CongWin = CongWin + MSS (MSSCongWin)
Additive increase resulting in increasing of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate ACK
Threshold = CongWin 2CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
SS or CA Timeout Threshold = CongWin 2CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter Slow Start
SS or CA Duplicate ACK Increment duplicate ACK count for segment being ACKed
CongWin and Threshold not changed
Transport Layer ndash TCP
47
BB
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3 MSSretransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++
duplicate ACK
cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
Transport Layer ndash TCP
48
BB
Congestion control
TCPrsquos Congestion Control Service
CLIENTSERVER
Problem Gridlock sets-in when there is packet loss due to router congestion
forces the End Systems to decrease the rate at which packets are sentdecrease the rate at which packets are sent during periods of congestion
The sending systemrsquos packet is lost due to congestion and is alerted when it stops receiving ACKs of packets sent
Transport Layer ndash TCP
49
BBTransport Layer 3-49
Macroscopic Description of TCP throughput
whatrsquos the average throughout of TCP as a function of window size and RTT ignore slow start (typically very short phases)
let W be the window size when loss occurs when window is W throughput is WRTT just after loss window drops to W2 throughput to
W2RTT Throughput increases linearly (by MSSRTT every
RTT) Average Throughput 75 WRTT
(Based on Idealised model for the steady-state dynamics of TCP)
Transport Layer 3-50
TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired
throughput of 10 Gbps requires window size W = 83333 in-flight
segments Throughput in terms of loss rate
L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)
new versions of TCP is needed for high-speed environments
LRTT
MSS221
Transport Layer ndash TCP
51
BB
TCP FairnessTCP Fairness
Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth
TCP connection 1
bottleneckrouter
capacity RR
TCP connection 2
Go to Summary of TCP Congestion Control
Transport Layer ndash TCP
52
BB
Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link
Link with transmission rate of RR
Each connection have the same MSSMSS RTTRTT
No other TCP connections or UDP datagrams traverse the shared link
Ignore slow start phase of TCP
Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)
Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing
AssumptionsAssumptions
Transport Layer ndash TCP
53
BB
Why is TCP fairWhy is TCP fair
Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput
proportionally
RR
equal bandwidth shareequal bandwidth share
Connection 1 throughput
Connect
ion 2
thro
ughput
congestion avoidance additive increaseloss decrease window by factor of 2
RR
A point on the graph depicts the amount of link bandwidth
jointly jointly consumedconsumed by
the connections
Full bandwidth utilisation line
We can viewWe can view a simulation a simulation
on thison this
View SimulationView Simulation
Transport Layer ndash TCP
54
BB
The End
The next succeeding slides are just for additional reading
Transport Layer ndash TCP
55
BB
TCP Latency ModelingTCP Latency Modeling
In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs
Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth
Multiple End Systems sharing a link
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
Loop holes in TCPLoop holes in TCP
1 TCP connection
1 TCP connection
1 TCP connection
3 TCP connections
Multithreading implementationMultithreading implementation
Transport Layer ndash TCP
56
BB
TCP latency modelingTCP latency modeling
TCP connection establishment time data transfer delay Actual data transmission time
Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent
QQ How long How long does it take to does it take to
receive an object receive an object from a Web from a Web
serverserver
No data transfer delayNo data transfer delay
Therersquos data transfer delayTherersquos data transfer delay
the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety
Transport Layer ndash TCP
57
BB
TCP Latency ModelingTCP Latency Modeling
Network is uncongested with one link between end systems of rate RR
CongWinCongWin (fixed) determines the amount of data that can be sent
No packet loss no packet corruption no retransmissions required
Header overheads are negligible
File to sendFile to send = integer number of segments of size MSS
Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times
CLIENT SERVER
FILEFILE
Initial ThresholdThreshold of TCP congestion mechanism is very big
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits
SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)
Transport Layer ndash TCP
58
BB
TCP latency ModelingTCP latency Modeling
Case 1 latency = 2RTT + OR
K = Number of Windows of data that cover the object
K = OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
Number of segmentsRounded up to the nearest integer
Assume W=4 segmentsAssume W=4 segments
eg O=256bits S=32bits W=4
Transport Layer ndash TCP
59
BB
TCP latency ModelingTCP latency Modeling
Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]
Number of Windows of data that cover the objectK= OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR
Sender has to wait for an ACK after a windowrsquos worth of data sent
STALLED STALLED PERIODPERIOD
If there are k windows sender willbe stalled (k-1) times
Transport Layer ndash TCP
60
BB
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
STALLED STALLED PERIODPERIOD
4 windows4 windows
OS=15
Transport Layer ndash TCP
61
BB
bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the
object as followsobject as follows
S
OkK k 110 222min
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1log
1logmin
12min
2
2
S
OK
S
OkkK
S
OkK k
Note
Transport Layer ndash TCP
62
BB
bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window
bull Transmission of Transmission of kkth window window ==
bull Stall Time Stall Time ==
bull Latency Latency ==
12
k
R
S
12
k
R
SRTT
R
S
RTTR
S
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1K
1
122k
k
R
SRTT
R
S
R
ORTT
Transport Layer ndash TCP
63
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
Transport Layer ndash TCP
64
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
R
S
R
SRTTP
R
ORTT P )12(2Latency
bull Closed-form expression for the latency
Transport Layer ndash TCP
65
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
2
1
RTTRO
P
encyMinimumLat
Latency
Slow start will not significantly increase latency if RTT ltlt OR
Transport Layer ndash TCP
66
BB
httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features
Transport Layer ndash TCP
44
BB
Refinement inferring lossRefinement inferring loss
after 3 dup ACKsafter 3 dup ACKs cwndcwnd is cut in halfhalf window then grows
linearlylinearly butbut after timeout after timeout
eventevent cwndcwnd is is set to 1 MSS1 MSS window then grows
exponentiallyexponentially Up to a thresholdUp to a threshold
then grows linearlylinearly
3 dup ACKs 3 dup ACKs indicates network capable of delivering some segments
timeouttimeout indicates a ldquomore alarmingrdquo congestion scenario
Philosophy
Transport Layer ndash TCP
45
BB
Refinement
Q when should the exponential increase switch to linear
A when cwndcwnd gets to 1212 of its value before timeout
Implementation variable ssthresh (slow-start threshold)ssthresh (slow-start threshold) on loss eventon loss event ssthreshssthresh is set to 1212 of cwndcwnd just
before loss event
Transport Layer ndash TCP
46
BB
TCP Sender Congestion ControlSTATE EVENT TCP SENDER Congestion-
Control ActionCommentary
SLOW START (SS)
ACK receipt for previously unACKed data
CongWin = CongWin + MSSIf(CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo
Resulting in a doubling of CongWin every RTT
Congestion Avoidance (CA)
ACK receipt for previously unACKed data
CongWin = CongWin + MSS (MSSCongWin)
Additive increase resulting in increasing of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate ACK
Threshold = CongWin 2CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
SS or CA Timeout Threshold = CongWin 2CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter Slow Start
SS or CA Duplicate ACK Increment duplicate ACK count for segment being ACKed
CongWin and Threshold not changed
Transport Layer ndash TCP
47
BB
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3 MSSretransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++
duplicate ACK
cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
Transport Layer ndash TCP
48
BB
Congestion control
TCPrsquos Congestion Control Service
CLIENTSERVER
Problem Gridlock sets-in when there is packet loss due to router congestion
forces the End Systems to decrease the rate at which packets are sentdecrease the rate at which packets are sent during periods of congestion
The sending systemrsquos packet is lost due to congestion and is alerted when it stops receiving ACKs of packets sent
Transport Layer ndash TCP
49
BBTransport Layer 3-49
Macroscopic Description of TCP throughput
whatrsquos the average throughout of TCP as a function of window size and RTT ignore slow start (typically very short phases)
let W be the window size when loss occurs when window is W throughput is WRTT just after loss window drops to W2 throughput to
W2RTT Throughput increases linearly (by MSSRTT every
RTT) Average Throughput 75 WRTT
(Based on Idealised model for the steady-state dynamics of TCP)
Transport Layer 3-50
TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired
throughput of 10 Gbps requires window size W = 83333 in-flight
segments Throughput in terms of loss rate
L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)
new versions of TCP is needed for high-speed environments
LRTT
MSS221
Transport Layer ndash TCP
51
BB
TCP FairnessTCP Fairness
Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth
TCP connection 1
bottleneckrouter
capacity RR
TCP connection 2
Go to Summary of TCP Congestion Control
Transport Layer ndash TCP
52
BB
Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link
Link with transmission rate of RR
Each connection have the same MSSMSS RTTRTT
No other TCP connections or UDP datagrams traverse the shared link
Ignore slow start phase of TCP
Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)
Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing
AssumptionsAssumptions
Transport Layer ndash TCP
53
BB
Why is TCP fairWhy is TCP fair
Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput
proportionally
RR
equal bandwidth shareequal bandwidth share
Connection 1 throughput
Connect
ion 2
thro
ughput
congestion avoidance additive increaseloss decrease window by factor of 2
RR
A point on the graph depicts the amount of link bandwidth
jointly jointly consumedconsumed by
the connections
Full bandwidth utilisation line
We can viewWe can view a simulation a simulation
on thison this
View SimulationView Simulation
Transport Layer ndash TCP
54
BB
The End
The next succeeding slides are just for additional reading
Transport Layer ndash TCP
55
BB
TCP Latency ModelingTCP Latency Modeling
In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs
Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth
Multiple End Systems sharing a link
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
Loop holes in TCPLoop holes in TCP
1 TCP connection
1 TCP connection
1 TCP connection
3 TCP connections
Multithreading implementationMultithreading implementation
Transport Layer ndash TCP
56
BB
TCP latency modelingTCP latency modeling
TCP connection establishment time data transfer delay Actual data transmission time
Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent
QQ How long How long does it take to does it take to
receive an object receive an object from a Web from a Web
serverserver
No data transfer delayNo data transfer delay
Therersquos data transfer delayTherersquos data transfer delay
the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety
Transport Layer ndash TCP
57
BB
TCP Latency ModelingTCP Latency Modeling
Network is uncongested with one link between end systems of rate RR
CongWinCongWin (fixed) determines the amount of data that can be sent
No packet loss no packet corruption no retransmissions required
Header overheads are negligible
File to sendFile to send = integer number of segments of size MSS
Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times
CLIENT SERVER
FILEFILE
Initial ThresholdThreshold of TCP congestion mechanism is very big
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits
SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)
Transport Layer ndash TCP
58
BB
TCP latency ModelingTCP latency Modeling
Case 1 latency = 2RTT + OR
K = Number of Windows of data that cover the object
K = OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
Number of segmentsRounded up to the nearest integer
Assume W=4 segmentsAssume W=4 segments
eg O=256bits S=32bits W=4
Transport Layer ndash TCP
59
BB
TCP latency ModelingTCP latency Modeling
Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]
Number of Windows of data that cover the objectK= OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR
Sender has to wait for an ACK after a windowrsquos worth of data sent
STALLED STALLED PERIODPERIOD
If there are k windows sender willbe stalled (k-1) times
Transport Layer ndash TCP
60
BB
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
STALLED STALLED PERIODPERIOD
4 windows4 windows
OS=15
Transport Layer ndash TCP
61
BB
bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the
object as followsobject as follows
S
OkK k 110 222min
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1log
1logmin
12min
2
2
S
OK
S
OkkK
S
OkK k
Note
Transport Layer ndash TCP
62
BB
bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window
bull Transmission of Transmission of kkth window window ==
bull Stall Time Stall Time ==
bull Latency Latency ==
12
k
R
S
12
k
R
SRTT
R
S
RTTR
S
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1K
1
122k
k
R
SRTT
R
S
R
ORTT
Transport Layer ndash TCP
63
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
Transport Layer ndash TCP
64
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
R
S
R
SRTTP
R
ORTT P )12(2Latency
bull Closed-form expression for the latency
Transport Layer ndash TCP
65
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
2
1
RTTRO
P
encyMinimumLat
Latency
Slow start will not significantly increase latency if RTT ltlt OR
Transport Layer ndash TCP
66
BB
httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features
Transport Layer ndash TCP
45
BB
Refinement
Q when should the exponential increase switch to linear
A when cwndcwnd gets to 1212 of its value before timeout
Implementation variable ssthresh (slow-start threshold)ssthresh (slow-start threshold) on loss eventon loss event ssthreshssthresh is set to 1212 of cwndcwnd just
before loss event
Transport Layer ndash TCP
46
BB
TCP Sender Congestion ControlSTATE EVENT TCP SENDER Congestion-
Control ActionCommentary
SLOW START (SS)
ACK receipt for previously unACKed data
CongWin = CongWin + MSSIf(CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo
Resulting in a doubling of CongWin every RTT
Congestion Avoidance (CA)
ACK receipt for previously unACKed data
CongWin = CongWin + MSS (MSSCongWin)
Additive increase resulting in increasing of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate ACK
Threshold = CongWin 2CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
SS or CA Timeout Threshold = CongWin 2CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter Slow Start
SS or CA Duplicate ACK Increment duplicate ACK count for segment being ACKed
CongWin and Threshold not changed
Transport Layer ndash TCP
47
BB
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3 MSSretransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++
duplicate ACK
cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
Transport Layer ndash TCP
48
BB
Congestion control
TCPrsquos Congestion Control Service
CLIENTSERVER
Problem Gridlock sets-in when there is packet loss due to router congestion
forces the End Systems to decrease the rate at which packets are sentdecrease the rate at which packets are sent during periods of congestion
The sending systemrsquos packet is lost due to congestion and is alerted when it stops receiving ACKs of packets sent
Transport Layer ndash TCP
49
BBTransport Layer 3-49
Macroscopic Description of TCP throughput
whatrsquos the average throughout of TCP as a function of window size and RTT ignore slow start (typically very short phases)
let W be the window size when loss occurs when window is W throughput is WRTT just after loss window drops to W2 throughput to
W2RTT Throughput increases linearly (by MSSRTT every
RTT) Average Throughput 75 WRTT
(Based on Idealised model for the steady-state dynamics of TCP)
Transport Layer 3-50
TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired
throughput of 10 Gbps requires window size W = 83333 in-flight
segments Throughput in terms of loss rate
L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)
new versions of TCP is needed for high-speed environments
LRTT
MSS221
Transport Layer ndash TCP
51
BB
TCP FairnessTCP Fairness
Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth
TCP connection 1
bottleneckrouter
capacity RR
TCP connection 2
Go to Summary of TCP Congestion Control
Transport Layer ndash TCP
52
BB
Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link
Link with transmission rate of RR
Each connection have the same MSSMSS RTTRTT
No other TCP connections or UDP datagrams traverse the shared link
Ignore slow start phase of TCP
Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)
Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing
AssumptionsAssumptions
Transport Layer ndash TCP
53
BB
Why is TCP fairWhy is TCP fair
Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput
proportionally
RR
equal bandwidth shareequal bandwidth share
Connection 1 throughput
Connect
ion 2
thro
ughput
congestion avoidance additive increaseloss decrease window by factor of 2
RR
A point on the graph depicts the amount of link bandwidth
jointly jointly consumedconsumed by
the connections
Full bandwidth utilisation line
We can viewWe can view a simulation a simulation
on thison this
View SimulationView Simulation
Transport Layer ndash TCP
54
BB
The End
The next succeeding slides are just for additional reading
Transport Layer ndash TCP
55
BB
TCP Latency ModelingTCP Latency Modeling
In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs
Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth
Multiple End Systems sharing a link
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
Loop holes in TCPLoop holes in TCP
1 TCP connection
1 TCP connection
1 TCP connection
3 TCP connections
Multithreading implementationMultithreading implementation
Transport Layer ndash TCP
56
BB
TCP latency modelingTCP latency modeling
TCP connection establishment time data transfer delay Actual data transmission time
Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent
QQ How long How long does it take to does it take to
receive an object receive an object from a Web from a Web
serverserver
No data transfer delayNo data transfer delay
Therersquos data transfer delayTherersquos data transfer delay
the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety
Transport Layer ndash TCP
57
BB
TCP Latency ModelingTCP Latency Modeling
Network is uncongested with one link between end systems of rate RR
CongWinCongWin (fixed) determines the amount of data that can be sent
No packet loss no packet corruption no retransmissions required
Header overheads are negligible
File to sendFile to send = integer number of segments of size MSS
Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times
CLIENT SERVER
FILEFILE
Initial ThresholdThreshold of TCP congestion mechanism is very big
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits
SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)
Transport Layer ndash TCP
58
BB
TCP latency ModelingTCP latency Modeling
Case 1 latency = 2RTT + OR
K = Number of Windows of data that cover the object
K = OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
Number of segmentsRounded up to the nearest integer
Assume W=4 segmentsAssume W=4 segments
eg O=256bits S=32bits W=4
Transport Layer ndash TCP
59
BB
TCP latency ModelingTCP latency Modeling
Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]
Number of Windows of data that cover the objectK= OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR
Sender has to wait for an ACK after a windowrsquos worth of data sent
STALLED STALLED PERIODPERIOD
If there are k windows sender willbe stalled (k-1) times
Transport Layer ndash TCP
60
BB
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
STALLED STALLED PERIODPERIOD
4 windows4 windows
OS=15
Transport Layer ndash TCP
61
BB
bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the
object as followsobject as follows
S
OkK k 110 222min
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1log
1logmin
12min
2
2
S
OK
S
OkkK
S
OkK k
Note
Transport Layer ndash TCP
62
BB
bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window
bull Transmission of Transmission of kkth window window ==
bull Stall Time Stall Time ==
bull Latency Latency ==
12
k
R
S
12
k
R
SRTT
R
S
RTTR
S
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1K
1
122k
k
R
SRTT
R
S
R
ORTT
Transport Layer ndash TCP
63
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
Transport Layer ndash TCP
64
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
R
S
R
SRTTP
R
ORTT P )12(2Latency
bull Closed-form expression for the latency
Transport Layer ndash TCP
65
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
2
1
RTTRO
P
encyMinimumLat
Latency
Slow start will not significantly increase latency if RTT ltlt OR
Transport Layer ndash TCP
66
BB
httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features
Transport Layer ndash TCP
46
BB
TCP Sender Congestion ControlSTATE EVENT TCP SENDER Congestion-
Control ActionCommentary
SLOW START (SS)
ACK receipt for previously unACKed data
CongWin = CongWin + MSSIf(CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo
Resulting in a doubling of CongWin every RTT
Congestion Avoidance (CA)
ACK receipt for previously unACKed data
CongWin = CongWin + MSS (MSSCongWin)
Additive increase resulting in increasing of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate ACK
Threshold = CongWin 2CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
SS or CA Timeout Threshold = CongWin 2CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter Slow Start
SS or CA Duplicate ACK Increment duplicate ACK count for segment being ACKed
CongWin and Threshold not changed
Transport Layer ndash TCP
47
BB
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3 MSSretransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++
duplicate ACK
cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
Transport Layer ndash TCP
48
BB
Congestion control
TCPrsquos Congestion Control Service
CLIENTSERVER
Problem Gridlock sets-in when there is packet loss due to router congestion
forces the End Systems to decrease the rate at which packets are sentdecrease the rate at which packets are sent during periods of congestion
The sending systemrsquos packet is lost due to congestion and is alerted when it stops receiving ACKs of packets sent
Transport Layer ndash TCP
49
BBTransport Layer 3-49
Macroscopic Description of TCP throughput
whatrsquos the average throughout of TCP as a function of window size and RTT ignore slow start (typically very short phases)
let W be the window size when loss occurs when window is W throughput is WRTT just after loss window drops to W2 throughput to
W2RTT Throughput increases linearly (by MSSRTT every
RTT) Average Throughput 75 WRTT
(Based on Idealised model for the steady-state dynamics of TCP)
Transport Layer 3-50
TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired
throughput of 10 Gbps requires window size W = 83333 in-flight
segments Throughput in terms of loss rate
L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)
new versions of TCP is needed for high-speed environments
LRTT
MSS221
Transport Layer ndash TCP
51
BB
TCP FairnessTCP Fairness
Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth
TCP connection 1
bottleneckrouter
capacity RR
TCP connection 2
Go to Summary of TCP Congestion Control
Transport Layer ndash TCP
52
BB
Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link
Link with transmission rate of RR
Each connection have the same MSSMSS RTTRTT
No other TCP connections or UDP datagrams traverse the shared link
Ignore slow start phase of TCP
Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)
Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing
AssumptionsAssumptions
Transport Layer ndash TCP
53
BB
Why is TCP fairWhy is TCP fair
Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput
proportionally
RR
equal bandwidth shareequal bandwidth share
Connection 1 throughput
Connect
ion 2
thro
ughput
congestion avoidance additive increaseloss decrease window by factor of 2
RR
A point on the graph depicts the amount of link bandwidth
jointly jointly consumedconsumed by
the connections
Full bandwidth utilisation line
We can viewWe can view a simulation a simulation
on thison this
View SimulationView Simulation
Transport Layer ndash TCP
54
BB
The End
The next succeeding slides are just for additional reading
Transport Layer ndash TCP
55
BB
TCP Latency ModelingTCP Latency Modeling
In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs
Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth
Multiple End Systems sharing a link
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
Loop holes in TCPLoop holes in TCP
1 TCP connection
1 TCP connection
1 TCP connection
3 TCP connections
Multithreading implementationMultithreading implementation
Transport Layer ndash TCP
56
BB
TCP latency modelingTCP latency modeling
TCP connection establishment time data transfer delay Actual data transmission time
Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent
QQ How long How long does it take to does it take to
receive an object receive an object from a Web from a Web
serverserver
No data transfer delayNo data transfer delay
Therersquos data transfer delayTherersquos data transfer delay
the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety
Transport Layer ndash TCP
57
BB
TCP Latency ModelingTCP Latency Modeling
Network is uncongested with one link between end systems of rate RR
CongWinCongWin (fixed) determines the amount of data that can be sent
No packet loss no packet corruption no retransmissions required
Header overheads are negligible
File to sendFile to send = integer number of segments of size MSS
Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times
CLIENT SERVER
FILEFILE
Initial ThresholdThreshold of TCP congestion mechanism is very big
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits
SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)
Transport Layer ndash TCP
58
BB
TCP latency ModelingTCP latency Modeling
Case 1 latency = 2RTT + OR
K = Number of Windows of data that cover the object
K = OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
Number of segmentsRounded up to the nearest integer
Assume W=4 segmentsAssume W=4 segments
eg O=256bits S=32bits W=4
Transport Layer ndash TCP
59
BB
TCP latency ModelingTCP latency Modeling
Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]
Number of Windows of data that cover the objectK= OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR
Sender has to wait for an ACK after a windowrsquos worth of data sent
STALLED STALLED PERIODPERIOD
If there are k windows sender willbe stalled (k-1) times
Transport Layer ndash TCP
60
BB
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
STALLED STALLED PERIODPERIOD
4 windows4 windows
OS=15
Transport Layer ndash TCP
61
BB
bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the
object as followsobject as follows
S
OkK k 110 222min
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1log
1logmin
12min
2
2
S
OK
S
OkkK
S
OkK k
Note
Transport Layer ndash TCP
62
BB
bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window
bull Transmission of Transmission of kkth window window ==
bull Stall Time Stall Time ==
bull Latency Latency ==
12
k
R
S
12
k
R
SRTT
R
S
RTTR
S
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1K
1
122k
k
R
SRTT
R
S
R
ORTT
Transport Layer ndash TCP
63
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
Transport Layer ndash TCP
64
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
R
S
R
SRTTP
R
ORTT P )12(2Latency
bull Closed-form expression for the latency
Transport Layer ndash TCP
65
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
2
1
RTTRO
P
encyMinimumLat
Latency
Slow start will not significantly increase latency if RTT ltlt OR
Transport Layer ndash TCP
66
BB
httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features
Transport Layer ndash TCP
47
BB
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3 MSSretransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++
duplicate ACK
cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
Transport Layer ndash TCP
48
BB
Congestion control
TCPrsquos Congestion Control Service
CLIENTSERVER
Problem Gridlock sets-in when there is packet loss due to router congestion
forces the End Systems to decrease the rate at which packets are sentdecrease the rate at which packets are sent during periods of congestion
The sending systemrsquos packet is lost due to congestion and is alerted when it stops receiving ACKs of packets sent
Transport Layer ndash TCP
49
BBTransport Layer 3-49
Macroscopic Description of TCP throughput
whatrsquos the average throughout of TCP as a function of window size and RTT ignore slow start (typically very short phases)
let W be the window size when loss occurs when window is W throughput is WRTT just after loss window drops to W2 throughput to
W2RTT Throughput increases linearly (by MSSRTT every
RTT) Average Throughput 75 WRTT
(Based on Idealised model for the steady-state dynamics of TCP)
Transport Layer 3-50
TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired
throughput of 10 Gbps requires window size W = 83333 in-flight
segments Throughput in terms of loss rate
L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)
new versions of TCP is needed for high-speed environments
LRTT
MSS221
Transport Layer ndash TCP
51
BB
TCP FairnessTCP Fairness
Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth
TCP connection 1
bottleneckrouter
capacity RR
TCP connection 2
Go to Summary of TCP Congestion Control
Transport Layer ndash TCP
52
BB
Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link
Link with transmission rate of RR
Each connection have the same MSSMSS RTTRTT
No other TCP connections or UDP datagrams traverse the shared link
Ignore slow start phase of TCP
Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)
Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing
AssumptionsAssumptions
Transport Layer ndash TCP
53
BB
Why is TCP fairWhy is TCP fair
Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput
proportionally
RR
equal bandwidth shareequal bandwidth share
Connection 1 throughput
Connect
ion 2
thro
ughput
congestion avoidance additive increaseloss decrease window by factor of 2
RR
A point on the graph depicts the amount of link bandwidth
jointly jointly consumedconsumed by
the connections
Full bandwidth utilisation line
We can viewWe can view a simulation a simulation
on thison this
View SimulationView Simulation
Transport Layer ndash TCP
54
BB
The End
The next succeeding slides are just for additional reading
Transport Layer ndash TCP
55
BB
TCP Latency ModelingTCP Latency Modeling
In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs
Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth
Multiple End Systems sharing a link
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
Loop holes in TCPLoop holes in TCP
1 TCP connection
1 TCP connection
1 TCP connection
3 TCP connections
Multithreading implementationMultithreading implementation
Transport Layer ndash TCP
56
BB
TCP latency modelingTCP latency modeling
TCP connection establishment time data transfer delay Actual data transmission time
Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent
QQ How long How long does it take to does it take to
receive an object receive an object from a Web from a Web
serverserver
No data transfer delayNo data transfer delay
Therersquos data transfer delayTherersquos data transfer delay
the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety
Transport Layer ndash TCP
57
BB
TCP Latency ModelingTCP Latency Modeling
Network is uncongested with one link between end systems of rate RR
CongWinCongWin (fixed) determines the amount of data that can be sent
No packet loss no packet corruption no retransmissions required
Header overheads are negligible
File to sendFile to send = integer number of segments of size MSS
Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times
CLIENT SERVER
FILEFILE
Initial ThresholdThreshold of TCP congestion mechanism is very big
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits
SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)
Transport Layer ndash TCP
58
BB
TCP latency ModelingTCP latency Modeling
Case 1 latency = 2RTT + OR
K = Number of Windows of data that cover the object
K = OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
Number of segmentsRounded up to the nearest integer
Assume W=4 segmentsAssume W=4 segments
eg O=256bits S=32bits W=4
Transport Layer ndash TCP
59
BB
TCP latency ModelingTCP latency Modeling
Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]
Number of Windows of data that cover the objectK= OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR
Sender has to wait for an ACK after a windowrsquos worth of data sent
STALLED STALLED PERIODPERIOD
If there are k windows sender willbe stalled (k-1) times
Transport Layer ndash TCP
60
BB
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
STALLED STALLED PERIODPERIOD
4 windows4 windows
OS=15
Transport Layer ndash TCP
61
BB
bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the
object as followsobject as follows
S
OkK k 110 222min
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1log
1logmin
12min
2
2
S
OK
S
OkkK
S
OkK k
Note
Transport Layer ndash TCP
62
BB
bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window
bull Transmission of Transmission of kkth window window ==
bull Stall Time Stall Time ==
bull Latency Latency ==
12
k
R
S
12
k
R
SRTT
R
S
RTTR
S
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1K
1
122k
k
R
SRTT
R
S
R
ORTT
Transport Layer ndash TCP
63
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
Transport Layer ndash TCP
64
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
R
S
R
SRTTP
R
ORTT P )12(2Latency
bull Closed-form expression for the latency
Transport Layer ndash TCP
65
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
2
1
RTTRO
P
encyMinimumLat
Latency
Slow start will not significantly increase latency if RTT ltlt OR
Transport Layer ndash TCP
66
BB
httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features
Transport Layer ndash TCP
48
BB
Congestion control
TCPrsquos Congestion Control Service
CLIENTSERVER
Problem Gridlock sets-in when there is packet loss due to router congestion
forces the End Systems to decrease the rate at which packets are sentdecrease the rate at which packets are sent during periods of congestion
The sending systemrsquos packet is lost due to congestion and is alerted when it stops receiving ACKs of packets sent
Transport Layer ndash TCP
49
BBTransport Layer 3-49
Macroscopic Description of TCP throughput
whatrsquos the average throughout of TCP as a function of window size and RTT ignore slow start (typically very short phases)
let W be the window size when loss occurs when window is W throughput is WRTT just after loss window drops to W2 throughput to
W2RTT Throughput increases linearly (by MSSRTT every
RTT) Average Throughput 75 WRTT
(Based on Idealised model for the steady-state dynamics of TCP)
Transport Layer 3-50
TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired
throughput of 10 Gbps requires window size W = 83333 in-flight
segments Throughput in terms of loss rate
L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)
new versions of TCP is needed for high-speed environments
LRTT
MSS221
Transport Layer ndash TCP
51
BB
TCP FairnessTCP Fairness
Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth
TCP connection 1
bottleneckrouter
capacity RR
TCP connection 2
Go to Summary of TCP Congestion Control
Transport Layer ndash TCP
52
BB
Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link
Link with transmission rate of RR
Each connection have the same MSSMSS RTTRTT
No other TCP connections or UDP datagrams traverse the shared link
Ignore slow start phase of TCP
Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)
Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing
AssumptionsAssumptions
Transport Layer ndash TCP
53
BB
Why is TCP fairWhy is TCP fair
Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput
proportionally
RR
equal bandwidth shareequal bandwidth share
Connection 1 throughput
Connect
ion 2
thro
ughput
congestion avoidance additive increaseloss decrease window by factor of 2
RR
A point on the graph depicts the amount of link bandwidth
jointly jointly consumedconsumed by
the connections
Full bandwidth utilisation line
We can viewWe can view a simulation a simulation
on thison this
View SimulationView Simulation
Transport Layer ndash TCP
54
BB
The End
The next succeeding slides are just for additional reading
Transport Layer ndash TCP
55
BB
TCP Latency ModelingTCP Latency Modeling
In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs
Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth
Multiple End Systems sharing a link
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
Loop holes in TCPLoop holes in TCP
1 TCP connection
1 TCP connection
1 TCP connection
3 TCP connections
Multithreading implementationMultithreading implementation
Transport Layer ndash TCP
56
BB
TCP latency modelingTCP latency modeling
TCP connection establishment time data transfer delay Actual data transmission time
Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent
QQ How long How long does it take to does it take to
receive an object receive an object from a Web from a Web
serverserver
No data transfer delayNo data transfer delay
Therersquos data transfer delayTherersquos data transfer delay
the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety
Transport Layer ndash TCP
57
BB
TCP Latency ModelingTCP Latency Modeling
Network is uncongested with one link between end systems of rate RR
CongWinCongWin (fixed) determines the amount of data that can be sent
No packet loss no packet corruption no retransmissions required
Header overheads are negligible
File to sendFile to send = integer number of segments of size MSS
Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times
CLIENT SERVER
FILEFILE
Initial ThresholdThreshold of TCP congestion mechanism is very big
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits
SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)
Transport Layer ndash TCP
58
BB
TCP latency ModelingTCP latency Modeling
Case 1 latency = 2RTT + OR
K = Number of Windows of data that cover the object
K = OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
Number of segmentsRounded up to the nearest integer
Assume W=4 segmentsAssume W=4 segments
eg O=256bits S=32bits W=4
Transport Layer ndash TCP
59
BB
TCP latency ModelingTCP latency Modeling
Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]
Number of Windows of data that cover the objectK= OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR
Sender has to wait for an ACK after a windowrsquos worth of data sent
STALLED STALLED PERIODPERIOD
If there are k windows sender willbe stalled (k-1) times
Transport Layer ndash TCP
60
BB
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
STALLED STALLED PERIODPERIOD
4 windows4 windows
OS=15
Transport Layer ndash TCP
61
BB
bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the
object as followsobject as follows
S
OkK k 110 222min
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1log
1logmin
12min
2
2
S
OK
S
OkkK
S
OkK k
Note
Transport Layer ndash TCP
62
BB
bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window
bull Transmission of Transmission of kkth window window ==
bull Stall Time Stall Time ==
bull Latency Latency ==
12
k
R
S
12
k
R
SRTT
R
S
RTTR
S
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1K
1
122k
k
R
SRTT
R
S
R
ORTT
Transport Layer ndash TCP
63
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
Transport Layer ndash TCP
64
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
R
S
R
SRTTP
R
ORTT P )12(2Latency
bull Closed-form expression for the latency
Transport Layer ndash TCP
65
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
2
1
RTTRO
P
encyMinimumLat
Latency
Slow start will not significantly increase latency if RTT ltlt OR
Transport Layer ndash TCP
66
BB
httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features
Transport Layer ndash TCP
49
BBTransport Layer 3-49
Macroscopic Description of TCP throughput
whatrsquos the average throughout of TCP as a function of window size and RTT ignore slow start (typically very short phases)
let W be the window size when loss occurs when window is W throughput is WRTT just after loss window drops to W2 throughput to
W2RTT Throughput increases linearly (by MSSRTT every
RTT) Average Throughput 75 WRTT
(Based on Idealised model for the steady-state dynamics of TCP)
Transport Layer 3-50
TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired
throughput of 10 Gbps requires window size W = 83333 in-flight
segments Throughput in terms of loss rate
L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)
new versions of TCP is needed for high-speed environments
LRTT
MSS221
Transport Layer ndash TCP
51
BB
TCP FairnessTCP Fairness
Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth
TCP connection 1
bottleneckrouter
capacity RR
TCP connection 2
Go to Summary of TCP Congestion Control
Transport Layer ndash TCP
52
BB
Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link
Link with transmission rate of RR
Each connection have the same MSSMSS RTTRTT
No other TCP connections or UDP datagrams traverse the shared link
Ignore slow start phase of TCP
Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)
Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing
AssumptionsAssumptions
Transport Layer ndash TCP
53
BB
Why is TCP fairWhy is TCP fair
Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput
proportionally
RR
equal bandwidth shareequal bandwidth share
Connection 1 throughput
Connect
ion 2
thro
ughput
congestion avoidance additive increaseloss decrease window by factor of 2
RR
A point on the graph depicts the amount of link bandwidth
jointly jointly consumedconsumed by
the connections
Full bandwidth utilisation line
We can viewWe can view a simulation a simulation
on thison this
View SimulationView Simulation
Transport Layer ndash TCP
54
BB
The End
The next succeeding slides are just for additional reading
Transport Layer ndash TCP
55
BB
TCP Latency ModelingTCP Latency Modeling
In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs
Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth
Multiple End Systems sharing a link
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
Loop holes in TCPLoop holes in TCP
1 TCP connection
1 TCP connection
1 TCP connection
3 TCP connections
Multithreading implementationMultithreading implementation
Transport Layer ndash TCP
56
BB
TCP latency modelingTCP latency modeling
TCP connection establishment time data transfer delay Actual data transmission time
Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent
QQ How long How long does it take to does it take to
receive an object receive an object from a Web from a Web
serverserver
No data transfer delayNo data transfer delay
Therersquos data transfer delayTherersquos data transfer delay
the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety
Transport Layer ndash TCP
57
BB
TCP Latency ModelingTCP Latency Modeling
Network is uncongested with one link between end systems of rate RR
CongWinCongWin (fixed) determines the amount of data that can be sent
No packet loss no packet corruption no retransmissions required
Header overheads are negligible
File to sendFile to send = integer number of segments of size MSS
Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times
CLIENT SERVER
FILEFILE
Initial ThresholdThreshold of TCP congestion mechanism is very big
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits
SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)
Transport Layer ndash TCP
58
BB
TCP latency ModelingTCP latency Modeling
Case 1 latency = 2RTT + OR
K = Number of Windows of data that cover the object
K = OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
Number of segmentsRounded up to the nearest integer
Assume W=4 segmentsAssume W=4 segments
eg O=256bits S=32bits W=4
Transport Layer ndash TCP
59
BB
TCP latency ModelingTCP latency Modeling
Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]
Number of Windows of data that cover the objectK= OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR
Sender has to wait for an ACK after a windowrsquos worth of data sent
STALLED STALLED PERIODPERIOD
If there are k windows sender willbe stalled (k-1) times
Transport Layer ndash TCP
60
BB
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
STALLED STALLED PERIODPERIOD
4 windows4 windows
OS=15
Transport Layer ndash TCP
61
BB
bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the
object as followsobject as follows
S
OkK k 110 222min
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1log
1logmin
12min
2
2
S
OK
S
OkkK
S
OkK k
Note
Transport Layer ndash TCP
62
BB
bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window
bull Transmission of Transmission of kkth window window ==
bull Stall Time Stall Time ==
bull Latency Latency ==
12
k
R
S
12
k
R
SRTT
R
S
RTTR
S
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1K
1
122k
k
R
SRTT
R
S
R
ORTT
Transport Layer ndash TCP
63
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
Transport Layer ndash TCP
64
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
R
S
R
SRTTP
R
ORTT P )12(2Latency
bull Closed-form expression for the latency
Transport Layer ndash TCP
65
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
2
1
RTTRO
P
encyMinimumLat
Latency
Slow start will not significantly increase latency if RTT ltlt OR
Transport Layer ndash TCP
66
BB
httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features
Transport Layer 3-50
TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired
throughput of 10 Gbps requires window size W = 83333 in-flight
segments Throughput in terms of loss rate
L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)
new versions of TCP is needed for high-speed environments
LRTT
MSS221
Transport Layer ndash TCP
51
BB
TCP FairnessTCP Fairness
Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth
TCP connection 1
bottleneckrouter
capacity RR
TCP connection 2
Go to Summary of TCP Congestion Control
Transport Layer ndash TCP
52
BB
Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link
Link with transmission rate of RR
Each connection have the same MSSMSS RTTRTT
No other TCP connections or UDP datagrams traverse the shared link
Ignore slow start phase of TCP
Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)
Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing
AssumptionsAssumptions
Transport Layer ndash TCP
53
BB
Why is TCP fairWhy is TCP fair
Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput
proportionally
RR
equal bandwidth shareequal bandwidth share
Connection 1 throughput
Connect
ion 2
thro
ughput
congestion avoidance additive increaseloss decrease window by factor of 2
RR
A point on the graph depicts the amount of link bandwidth
jointly jointly consumedconsumed by
the connections
Full bandwidth utilisation line
We can viewWe can view a simulation a simulation
on thison this
View SimulationView Simulation
Transport Layer ndash TCP
54
BB
The End
The next succeeding slides are just for additional reading
Transport Layer ndash TCP
55
BB
TCP Latency ModelingTCP Latency Modeling
In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs
Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth
Multiple End Systems sharing a link
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
Loop holes in TCPLoop holes in TCP
1 TCP connection
1 TCP connection
1 TCP connection
3 TCP connections
Multithreading implementationMultithreading implementation
Transport Layer ndash TCP
56
BB
TCP latency modelingTCP latency modeling
TCP connection establishment time data transfer delay Actual data transmission time
Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent
QQ How long How long does it take to does it take to
receive an object receive an object from a Web from a Web
serverserver
No data transfer delayNo data transfer delay
Therersquos data transfer delayTherersquos data transfer delay
the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety
Transport Layer ndash TCP
57
BB
TCP Latency ModelingTCP Latency Modeling
Network is uncongested with one link between end systems of rate RR
CongWinCongWin (fixed) determines the amount of data that can be sent
No packet loss no packet corruption no retransmissions required
Header overheads are negligible
File to sendFile to send = integer number of segments of size MSS
Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times
CLIENT SERVER
FILEFILE
Initial ThresholdThreshold of TCP congestion mechanism is very big
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits
SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)
Transport Layer ndash TCP
58
BB
TCP latency ModelingTCP latency Modeling
Case 1 latency = 2RTT + OR
K = Number of Windows of data that cover the object
K = OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
Number of segmentsRounded up to the nearest integer
Assume W=4 segmentsAssume W=4 segments
eg O=256bits S=32bits W=4
Transport Layer ndash TCP
59
BB
TCP latency ModelingTCP latency Modeling
Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]
Number of Windows of data that cover the objectK= OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR
Sender has to wait for an ACK after a windowrsquos worth of data sent
STALLED STALLED PERIODPERIOD
If there are k windows sender willbe stalled (k-1) times
Transport Layer ndash TCP
60
BB
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
STALLED STALLED PERIODPERIOD
4 windows4 windows
OS=15
Transport Layer ndash TCP
61
BB
bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the
object as followsobject as follows
S
OkK k 110 222min
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1log
1logmin
12min
2
2
S
OK
S
OkkK
S
OkK k
Note
Transport Layer ndash TCP
62
BB
bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window
bull Transmission of Transmission of kkth window window ==
bull Stall Time Stall Time ==
bull Latency Latency ==
12
k
R
S
12
k
R
SRTT
R
S
RTTR
S
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1K
1
122k
k
R
SRTT
R
S
R
ORTT
Transport Layer ndash TCP
63
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
Transport Layer ndash TCP
64
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
R
S
R
SRTTP
R
ORTT P )12(2Latency
bull Closed-form expression for the latency
Transport Layer ndash TCP
65
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
2
1
RTTRO
P
encyMinimumLat
Latency
Slow start will not significantly increase latency if RTT ltlt OR
Transport Layer ndash TCP
66
BB
httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features
Transport Layer ndash TCP
51
BB
TCP FairnessTCP Fairness
Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth
TCP connection 1
bottleneckrouter
capacity RR
TCP connection 2
Go to Summary of TCP Congestion Control
Transport Layer ndash TCP
52
BB
Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link
Link with transmission rate of RR
Each connection have the same MSSMSS RTTRTT
No other TCP connections or UDP datagrams traverse the shared link
Ignore slow start phase of TCP
Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)
Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing
AssumptionsAssumptions
Transport Layer ndash TCP
53
BB
Why is TCP fairWhy is TCP fair
Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput
proportionally
RR
equal bandwidth shareequal bandwidth share
Connection 1 throughput
Connect
ion 2
thro
ughput
congestion avoidance additive increaseloss decrease window by factor of 2
RR
A point on the graph depicts the amount of link bandwidth
jointly jointly consumedconsumed by
the connections
Full bandwidth utilisation line
We can viewWe can view a simulation a simulation
on thison this
View SimulationView Simulation
Transport Layer ndash TCP
54
BB
The End
The next succeeding slides are just for additional reading
Transport Layer ndash TCP
55
BB
TCP Latency ModelingTCP Latency Modeling
In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs
Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth
Multiple End Systems sharing a link
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
Loop holes in TCPLoop holes in TCP
1 TCP connection
1 TCP connection
1 TCP connection
3 TCP connections
Multithreading implementationMultithreading implementation
Transport Layer ndash TCP
56
BB
TCP latency modelingTCP latency modeling
TCP connection establishment time data transfer delay Actual data transmission time
Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent
QQ How long How long does it take to does it take to
receive an object receive an object from a Web from a Web
serverserver
No data transfer delayNo data transfer delay
Therersquos data transfer delayTherersquos data transfer delay
the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety
Transport Layer ndash TCP
57
BB
TCP Latency ModelingTCP Latency Modeling
Network is uncongested with one link between end systems of rate RR
CongWinCongWin (fixed) determines the amount of data that can be sent
No packet loss no packet corruption no retransmissions required
Header overheads are negligible
File to sendFile to send = integer number of segments of size MSS
Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times
CLIENT SERVER
FILEFILE
Initial ThresholdThreshold of TCP congestion mechanism is very big
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits
SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)
Transport Layer ndash TCP
58
BB
TCP latency ModelingTCP latency Modeling
Case 1 latency = 2RTT + OR
K = Number of Windows of data that cover the object
K = OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
Number of segmentsRounded up to the nearest integer
Assume W=4 segmentsAssume W=4 segments
eg O=256bits S=32bits W=4
Transport Layer ndash TCP
59
BB
TCP latency ModelingTCP latency Modeling
Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]
Number of Windows of data that cover the objectK= OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR
Sender has to wait for an ACK after a windowrsquos worth of data sent
STALLED STALLED PERIODPERIOD
If there are k windows sender willbe stalled (k-1) times
Transport Layer ndash TCP
60
BB
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
STALLED STALLED PERIODPERIOD
4 windows4 windows
OS=15
Transport Layer ndash TCP
61
BB
bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the
object as followsobject as follows
S
OkK k 110 222min
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1log
1logmin
12min
2
2
S
OK
S
OkkK
S
OkK k
Note
Transport Layer ndash TCP
62
BB
bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window
bull Transmission of Transmission of kkth window window ==
bull Stall Time Stall Time ==
bull Latency Latency ==
12
k
R
S
12
k
R
SRTT
R
S
RTTR
S
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1K
1
122k
k
R
SRTT
R
S
R
ORTT
Transport Layer ndash TCP
63
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
Transport Layer ndash TCP
64
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
R
S
R
SRTTP
R
ORTT P )12(2Latency
bull Closed-form expression for the latency
Transport Layer ndash TCP
65
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
2
1
RTTRO
P
encyMinimumLat
Latency
Slow start will not significantly increase latency if RTT ltlt OR
Transport Layer ndash TCP
66
BB
httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features
Transport Layer ndash TCP
52
BB
Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link
Link with transmission rate of RR
Each connection have the same MSSMSS RTTRTT
No other TCP connections or UDP datagrams traverse the shared link
Ignore slow start phase of TCP
Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)
Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing
AssumptionsAssumptions
Transport Layer ndash TCP
53
BB
Why is TCP fairWhy is TCP fair
Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput
proportionally
RR
equal bandwidth shareequal bandwidth share
Connection 1 throughput
Connect
ion 2
thro
ughput
congestion avoidance additive increaseloss decrease window by factor of 2
RR
A point on the graph depicts the amount of link bandwidth
jointly jointly consumedconsumed by
the connections
Full bandwidth utilisation line
We can viewWe can view a simulation a simulation
on thison this
View SimulationView Simulation
Transport Layer ndash TCP
54
BB
The End
The next succeeding slides are just for additional reading
Transport Layer ndash TCP
55
BB
TCP Latency ModelingTCP Latency Modeling
In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs
Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth
Multiple End Systems sharing a link
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
Loop holes in TCPLoop holes in TCP
1 TCP connection
1 TCP connection
1 TCP connection
3 TCP connections
Multithreading implementationMultithreading implementation
Transport Layer ndash TCP
56
BB
TCP latency modelingTCP latency modeling
TCP connection establishment time data transfer delay Actual data transmission time
Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent
QQ How long How long does it take to does it take to
receive an object receive an object from a Web from a Web
serverserver
No data transfer delayNo data transfer delay
Therersquos data transfer delayTherersquos data transfer delay
the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety
Transport Layer ndash TCP
57
BB
TCP Latency ModelingTCP Latency Modeling
Network is uncongested with one link between end systems of rate RR
CongWinCongWin (fixed) determines the amount of data that can be sent
No packet loss no packet corruption no retransmissions required
Header overheads are negligible
File to sendFile to send = integer number of segments of size MSS
Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times
CLIENT SERVER
FILEFILE
Initial ThresholdThreshold of TCP congestion mechanism is very big
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits
SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)
Transport Layer ndash TCP
58
BB
TCP latency ModelingTCP latency Modeling
Case 1 latency = 2RTT + OR
K = Number of Windows of data that cover the object
K = OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
Number of segmentsRounded up to the nearest integer
Assume W=4 segmentsAssume W=4 segments
eg O=256bits S=32bits W=4
Transport Layer ndash TCP
59
BB
TCP latency ModelingTCP latency Modeling
Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]
Number of Windows of data that cover the objectK= OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR
Sender has to wait for an ACK after a windowrsquos worth of data sent
STALLED STALLED PERIODPERIOD
If there are k windows sender willbe stalled (k-1) times
Transport Layer ndash TCP
60
BB
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
STALLED STALLED PERIODPERIOD
4 windows4 windows
OS=15
Transport Layer ndash TCP
61
BB
bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the
object as followsobject as follows
S
OkK k 110 222min
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1log
1logmin
12min
2
2
S
OK
S
OkkK
S
OkK k
Note
Transport Layer ndash TCP
62
BB
bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window
bull Transmission of Transmission of kkth window window ==
bull Stall Time Stall Time ==
bull Latency Latency ==
12
k
R
S
12
k
R
SRTT
R
S
RTTR
S
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1K
1
122k
k
R
SRTT
R
S
R
ORTT
Transport Layer ndash TCP
63
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
Transport Layer ndash TCP
64
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
R
S
R
SRTTP
R
ORTT P )12(2Latency
bull Closed-form expression for the latency
Transport Layer ndash TCP
65
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
2
1
RTTRO
P
encyMinimumLat
Latency
Slow start will not significantly increase latency if RTT ltlt OR
Transport Layer ndash TCP
66
BB
httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features
Transport Layer ndash TCP
53
BB
Why is TCP fairWhy is TCP fair
Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput
proportionally
RR
equal bandwidth shareequal bandwidth share
Connection 1 throughput
Connect
ion 2
thro
ughput
congestion avoidance additive increaseloss decrease window by factor of 2
RR
A point on the graph depicts the amount of link bandwidth
jointly jointly consumedconsumed by
the connections
Full bandwidth utilisation line
We can viewWe can view a simulation a simulation
on thison this
View SimulationView Simulation
Transport Layer ndash TCP
54
BB
The End
The next succeeding slides are just for additional reading
Transport Layer ndash TCP
55
BB
TCP Latency ModelingTCP Latency Modeling
In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs
Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth
Multiple End Systems sharing a link
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
Loop holes in TCPLoop holes in TCP
1 TCP connection
1 TCP connection
1 TCP connection
3 TCP connections
Multithreading implementationMultithreading implementation
Transport Layer ndash TCP
56
BB
TCP latency modelingTCP latency modeling
TCP connection establishment time data transfer delay Actual data transmission time
Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent
QQ How long How long does it take to does it take to
receive an object receive an object from a Web from a Web
serverserver
No data transfer delayNo data transfer delay
Therersquos data transfer delayTherersquos data transfer delay
the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety
Transport Layer ndash TCP
57
BB
TCP Latency ModelingTCP Latency Modeling
Network is uncongested with one link between end systems of rate RR
CongWinCongWin (fixed) determines the amount of data that can be sent
No packet loss no packet corruption no retransmissions required
Header overheads are negligible
File to sendFile to send = integer number of segments of size MSS
Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times
CLIENT SERVER
FILEFILE
Initial ThresholdThreshold of TCP congestion mechanism is very big
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits
SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)
Transport Layer ndash TCP
58
BB
TCP latency ModelingTCP latency Modeling
Case 1 latency = 2RTT + OR
K = Number of Windows of data that cover the object
K = OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
Number of segmentsRounded up to the nearest integer
Assume W=4 segmentsAssume W=4 segments
eg O=256bits S=32bits W=4
Transport Layer ndash TCP
59
BB
TCP latency ModelingTCP latency Modeling
Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]
Number of Windows of data that cover the objectK= OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR
Sender has to wait for an ACK after a windowrsquos worth of data sent
STALLED STALLED PERIODPERIOD
If there are k windows sender willbe stalled (k-1) times
Transport Layer ndash TCP
60
BB
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
STALLED STALLED PERIODPERIOD
4 windows4 windows
OS=15
Transport Layer ndash TCP
61
BB
bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the
object as followsobject as follows
S
OkK k 110 222min
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1log
1logmin
12min
2
2
S
OK
S
OkkK
S
OkK k
Note
Transport Layer ndash TCP
62
BB
bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window
bull Transmission of Transmission of kkth window window ==
bull Stall Time Stall Time ==
bull Latency Latency ==
12
k
R
S
12
k
R
SRTT
R
S
RTTR
S
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1K
1
122k
k
R
SRTT
R
S
R
ORTT
Transport Layer ndash TCP
63
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
Transport Layer ndash TCP
64
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
R
S
R
SRTTP
R
ORTT P )12(2Latency
bull Closed-form expression for the latency
Transport Layer ndash TCP
65
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
2
1
RTTRO
P
encyMinimumLat
Latency
Slow start will not significantly increase latency if RTT ltlt OR
Transport Layer ndash TCP
66
BB
httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features
Transport Layer ndash TCP
54
BB
The End
The next succeeding slides are just for additional reading
Transport Layer ndash TCP
55
BB
TCP Latency ModelingTCP Latency Modeling
In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs
Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth
Multiple End Systems sharing a link
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
Loop holes in TCPLoop holes in TCP
1 TCP connection
1 TCP connection
1 TCP connection
3 TCP connections
Multithreading implementationMultithreading implementation
Transport Layer ndash TCP
56
BB
TCP latency modelingTCP latency modeling
TCP connection establishment time data transfer delay Actual data transmission time
Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent
QQ How long How long does it take to does it take to
receive an object receive an object from a Web from a Web
serverserver
No data transfer delayNo data transfer delay
Therersquos data transfer delayTherersquos data transfer delay
the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety
Transport Layer ndash TCP
57
BB
TCP Latency ModelingTCP Latency Modeling
Network is uncongested with one link between end systems of rate RR
CongWinCongWin (fixed) determines the amount of data that can be sent
No packet loss no packet corruption no retransmissions required
Header overheads are negligible
File to sendFile to send = integer number of segments of size MSS
Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times
CLIENT SERVER
FILEFILE
Initial ThresholdThreshold of TCP congestion mechanism is very big
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits
SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)
Transport Layer ndash TCP
58
BB
TCP latency ModelingTCP latency Modeling
Case 1 latency = 2RTT + OR
K = Number of Windows of data that cover the object
K = OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
Number of segmentsRounded up to the nearest integer
Assume W=4 segmentsAssume W=4 segments
eg O=256bits S=32bits W=4
Transport Layer ndash TCP
59
BB
TCP latency ModelingTCP latency Modeling
Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]
Number of Windows of data that cover the objectK= OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR
Sender has to wait for an ACK after a windowrsquos worth of data sent
STALLED STALLED PERIODPERIOD
If there are k windows sender willbe stalled (k-1) times
Transport Layer ndash TCP
60
BB
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
STALLED STALLED PERIODPERIOD
4 windows4 windows
OS=15
Transport Layer ndash TCP
61
BB
bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the
object as followsobject as follows
S
OkK k 110 222min
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1log
1logmin
12min
2
2
S
OK
S
OkkK
S
OkK k
Note
Transport Layer ndash TCP
62
BB
bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window
bull Transmission of Transmission of kkth window window ==
bull Stall Time Stall Time ==
bull Latency Latency ==
12
k
R
S
12
k
R
SRTT
R
S
RTTR
S
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1K
1
122k
k
R
SRTT
R
S
R
ORTT
Transport Layer ndash TCP
63
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
Transport Layer ndash TCP
64
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
R
S
R
SRTTP
R
ORTT P )12(2Latency
bull Closed-form expression for the latency
Transport Layer ndash TCP
65
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
2
1
RTTRO
P
encyMinimumLat
Latency
Slow start will not significantly increase latency if RTT ltlt OR
Transport Layer ndash TCP
66
BB
httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features
Transport Layer ndash TCP
55
BB
TCP Latency ModelingTCP Latency Modeling
In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs
Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth
Multiple End Systems sharing a link
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
Loop holes in TCPLoop holes in TCP
1 TCP connection
1 TCP connection
1 TCP connection
3 TCP connections
Multithreading implementationMultithreading implementation
Transport Layer ndash TCP
56
BB
TCP latency modelingTCP latency modeling
TCP connection establishment time data transfer delay Actual data transmission time
Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent
QQ How long How long does it take to does it take to
receive an object receive an object from a Web from a Web
serverserver
No data transfer delayNo data transfer delay
Therersquos data transfer delayTherersquos data transfer delay
the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety
Transport Layer ndash TCP
57
BB
TCP Latency ModelingTCP Latency Modeling
Network is uncongested with one link between end systems of rate RR
CongWinCongWin (fixed) determines the amount of data that can be sent
No packet loss no packet corruption no retransmissions required
Header overheads are negligible
File to sendFile to send = integer number of segments of size MSS
Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times
CLIENT SERVER
FILEFILE
Initial ThresholdThreshold of TCP congestion mechanism is very big
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits
SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)
Transport Layer ndash TCP
58
BB
TCP latency ModelingTCP latency Modeling
Case 1 latency = 2RTT + OR
K = Number of Windows of data that cover the object
K = OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
Number of segmentsRounded up to the nearest integer
Assume W=4 segmentsAssume W=4 segments
eg O=256bits S=32bits W=4
Transport Layer ndash TCP
59
BB
TCP latency ModelingTCP latency Modeling
Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]
Number of Windows of data that cover the objectK= OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR
Sender has to wait for an ACK after a windowrsquos worth of data sent
STALLED STALLED PERIODPERIOD
If there are k windows sender willbe stalled (k-1) times
Transport Layer ndash TCP
60
BB
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
STALLED STALLED PERIODPERIOD
4 windows4 windows
OS=15
Transport Layer ndash TCP
61
BB
bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the
object as followsobject as follows
S
OkK k 110 222min
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1log
1logmin
12min
2
2
S
OK
S
OkkK
S
OkK k
Note
Transport Layer ndash TCP
62
BB
bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window
bull Transmission of Transmission of kkth window window ==
bull Stall Time Stall Time ==
bull Latency Latency ==
12
k
R
S
12
k
R
SRTT
R
S
RTTR
S
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1K
1
122k
k
R
SRTT
R
S
R
ORTT
Transport Layer ndash TCP
63
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
Transport Layer ndash TCP
64
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
R
S
R
SRTTP
R
ORTT P )12(2Latency
bull Closed-form expression for the latency
Transport Layer ndash TCP
65
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
2
1
RTTRO
P
encyMinimumLat
Latency
Slow start will not significantly increase latency if RTT ltlt OR
Transport Layer ndash TCP
66
BB
httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features
Transport Layer ndash TCP
56
BB
TCP latency modelingTCP latency modeling
TCP connection establishment time data transfer delay Actual data transmission time
Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent
QQ How long How long does it take to does it take to
receive an object receive an object from a Web from a Web
serverserver
No data transfer delayNo data transfer delay
Therersquos data transfer delayTherersquos data transfer delay
the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety
Transport Layer ndash TCP
57
BB
TCP Latency ModelingTCP Latency Modeling
Network is uncongested with one link between end systems of rate RR
CongWinCongWin (fixed) determines the amount of data that can be sent
No packet loss no packet corruption no retransmissions required
Header overheads are negligible
File to sendFile to send = integer number of segments of size MSS
Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times
CLIENT SERVER
FILEFILE
Initial ThresholdThreshold of TCP congestion mechanism is very big
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits
SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)
Transport Layer ndash TCP
58
BB
TCP latency ModelingTCP latency Modeling
Case 1 latency = 2RTT + OR
K = Number of Windows of data that cover the object
K = OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
Number of segmentsRounded up to the nearest integer
Assume W=4 segmentsAssume W=4 segments
eg O=256bits S=32bits W=4
Transport Layer ndash TCP
59
BB
TCP latency ModelingTCP latency Modeling
Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]
Number of Windows of data that cover the objectK= OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR
Sender has to wait for an ACK after a windowrsquos worth of data sent
STALLED STALLED PERIODPERIOD
If there are k windows sender willbe stalled (k-1) times
Transport Layer ndash TCP
60
BB
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
STALLED STALLED PERIODPERIOD
4 windows4 windows
OS=15
Transport Layer ndash TCP
61
BB
bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the
object as followsobject as follows
S
OkK k 110 222min
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1log
1logmin
12min
2
2
S
OK
S
OkkK
S
OkK k
Note
Transport Layer ndash TCP
62
BB
bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window
bull Transmission of Transmission of kkth window window ==
bull Stall Time Stall Time ==
bull Latency Latency ==
12
k
R
S
12
k
R
SRTT
R
S
RTTR
S
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1K
1
122k
k
R
SRTT
R
S
R
ORTT
Transport Layer ndash TCP
63
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
Transport Layer ndash TCP
64
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
R
S
R
SRTTP
R
ORTT P )12(2Latency
bull Closed-form expression for the latency
Transport Layer ndash TCP
65
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
2
1
RTTRO
P
encyMinimumLat
Latency
Slow start will not significantly increase latency if RTT ltlt OR
Transport Layer ndash TCP
66
BB
httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features
Transport Layer ndash TCP
57
BB
TCP Latency ModelingTCP Latency Modeling
Network is uncongested with one link between end systems of rate RR
CongWinCongWin (fixed) determines the amount of data that can be sent
No packet loss no packet corruption no retransmissions required
Header overheads are negligible
File to sendFile to send = integer number of segments of size MSS
Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times
CLIENT SERVER
FILEFILE
Initial ThresholdThreshold of TCP congestion mechanism is very big
R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate
AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits
SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)
Transport Layer ndash TCP
58
BB
TCP latency ModelingTCP latency Modeling
Case 1 latency = 2RTT + OR
K = Number of Windows of data that cover the object
K = OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
Number of segmentsRounded up to the nearest integer
Assume W=4 segmentsAssume W=4 segments
eg O=256bits S=32bits W=4
Transport Layer ndash TCP
59
BB
TCP latency ModelingTCP latency Modeling
Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]
Number of Windows of data that cover the objectK= OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR
Sender has to wait for an ACK after a windowrsquos worth of data sent
STALLED STALLED PERIODPERIOD
If there are k windows sender willbe stalled (k-1) times
Transport Layer ndash TCP
60
BB
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
STALLED STALLED PERIODPERIOD
4 windows4 windows
OS=15
Transport Layer ndash TCP
61
BB
bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the
object as followsobject as follows
S
OkK k 110 222min
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1log
1logmin
12min
2
2
S
OK
S
OkkK
S
OkK k
Note
Transport Layer ndash TCP
62
BB
bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window
bull Transmission of Transmission of kkth window window ==
bull Stall Time Stall Time ==
bull Latency Latency ==
12
k
R
S
12
k
R
SRTT
R
S
RTTR
S
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1K
1
122k
k
R
SRTT
R
S
R
ORTT
Transport Layer ndash TCP
63
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
Transport Layer ndash TCP
64
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
R
S
R
SRTTP
R
ORTT P )12(2Latency
bull Closed-form expression for the latency
Transport Layer ndash TCP
65
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
2
1
RTTRO
P
encyMinimumLat
Latency
Slow start will not significantly increase latency if RTT ltlt OR
Transport Layer ndash TCP
66
BB
httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features
Transport Layer ndash TCP
58
BB
TCP latency ModelingTCP latency Modeling
Case 1 latency = 2RTT + OR
K = Number of Windows of data that cover the object
K = OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR
An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent
Number of segmentsRounded up to the nearest integer
Assume W=4 segmentsAssume W=4 segments
eg O=256bits S=32bits W=4
Transport Layer ndash TCP
59
BB
TCP latency ModelingTCP latency Modeling
Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]
Number of Windows of data that cover the objectK= OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR
Sender has to wait for an ACK after a windowrsquos worth of data sent
STALLED STALLED PERIODPERIOD
If there are k windows sender willbe stalled (k-1) times
Transport Layer ndash TCP
60
BB
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
STALLED STALLED PERIODPERIOD
4 windows4 windows
OS=15
Transport Layer ndash TCP
61
BB
bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the
object as followsobject as follows
S
OkK k 110 222min
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1log
1logmin
12min
2
2
S
OK
S
OkkK
S
OkK k
Note
Transport Layer ndash TCP
62
BB
bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window
bull Transmission of Transmission of kkth window window ==
bull Stall Time Stall Time ==
bull Latency Latency ==
12
k
R
S
12
k
R
SRTT
R
S
RTTR
S
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1K
1
122k
k
R
SRTT
R
S
R
ORTT
Transport Layer ndash TCP
63
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
Transport Layer ndash TCP
64
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
R
S
R
SRTTP
R
ORTT P )12(2Latency
bull Closed-form expression for the latency
Transport Layer ndash TCP
65
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
2
1
RTTRO
P
encyMinimumLat
Latency
Slow start will not significantly increase latency if RTT ltlt OR
Transport Layer ndash TCP
66
BB
httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features
Transport Layer ndash TCP
59
BB
TCP latency ModelingTCP latency Modeling
Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]
Number of Windows of data that cover the objectK= OWS
Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW
Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR
Sender has to wait for an ACK after a windowrsquos worth of data sent
STALLED STALLED PERIODPERIOD
If there are k windows sender willbe stalled (k-1) times
Transport Layer ndash TCP
60
BB
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
STALLED STALLED PERIODPERIOD
4 windows4 windows
OS=15
Transport Layer ndash TCP
61
BB
bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the
object as followsobject as follows
S
OkK k 110 222min
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1log
1logmin
12min
2
2
S
OK
S
OkkK
S
OkK k
Note
Transport Layer ndash TCP
62
BB
bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window
bull Transmission of Transmission of kkth window window ==
bull Stall Time Stall Time ==
bull Latency Latency ==
12
k
R
S
12
k
R
SRTT
R
S
RTTR
S
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1K
1
122k
k
R
SRTT
R
S
R
ORTT
Transport Layer ndash TCP
63
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
Transport Layer ndash TCP
64
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
R
S
R
SRTTP
R
ORTT P )12(2Latency
bull Closed-form expression for the latency
Transport Layer ndash TCP
65
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
2
1
RTTRO
P
encyMinimumLat
Latency
Slow start will not significantly increase latency if RTT ltlt OR
Transport Layer ndash TCP
66
BB
httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features
Transport Layer ndash TCP
60
BB
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
STALLED STALLED PERIODPERIOD
4 windows4 windows
OS=15
Transport Layer ndash TCP
61
BB
bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the
object as followsobject as follows
S
OkK k 110 222min
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1log
1logmin
12min
2
2
S
OK
S
OkkK
S
OkK k
Note
Transport Layer ndash TCP
62
BB
bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window
bull Transmission of Transmission of kkth window window ==
bull Stall Time Stall Time ==
bull Latency Latency ==
12
k
R
S
12
k
R
SRTT
R
S
RTTR
S
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1K
1
122k
k
R
SRTT
R
S
R
ORTT
Transport Layer ndash TCP
63
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
Transport Layer ndash TCP
64
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
R
S
R
SRTTP
R
ORTT P )12(2Latency
bull Closed-form expression for the latency
Transport Layer ndash TCP
65
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
2
1
RTTRO
P
encyMinimumLat
Latency
Slow start will not significantly increase latency if RTT ltlt OR
Transport Layer ndash TCP
66
BB
httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features
Transport Layer ndash TCP
61
BB
bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the
object as followsobject as follows
S
OkK k 110 222min
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1log
1logmin
12min
2
2
S
OK
S
OkkK
S
OkK k
Note
Transport Layer ndash TCP
62
BB
bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window
bull Transmission of Transmission of kkth window window ==
bull Stall Time Stall Time ==
bull Latency Latency ==
12
k
R
S
12
k
R
SRTT
R
S
RTTR
S
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1K
1
122k
k
R
SRTT
R
S
R
ORTT
Transport Layer ndash TCP
63
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
Transport Layer ndash TCP
64
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
R
S
R
SRTTP
R
ORTT P )12(2Latency
bull Closed-form expression for the latency
Transport Layer ndash TCP
65
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
2
1
RTTRO
P
encyMinimumLat
Latency
Slow start will not significantly increase latency if RTT ltlt OR
Transport Layer ndash TCP
66
BB
httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features
Transport Layer ndash TCP
62
BB
bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window
bull Transmission of Transmission of kkth window window ==
bull Stall Time Stall Time ==
bull Latency Latency ==
12
k
R
S
12
k
R
SRTT
R
S
RTTR
S
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
1K
1
122k
k
R
SRTT
R
S
R
ORTT
Transport Layer ndash TCP
63
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
Transport Layer ndash TCP
64
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
R
S
R
SRTTP
R
ORTT P )12(2Latency
bull Closed-form expression for the latency
Transport Layer ndash TCP
65
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
2
1
RTTRO
P
encyMinimumLat
Latency
Slow start will not significantly increase latency if RTT ltlt OR
Transport Layer ndash TCP
66
BB
httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features
Transport Layer ndash TCP
63
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
Transport Layer ndash TCP
64
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
R
S
R
SRTTP
R
ORTT P )12(2Latency
bull Closed-form expression for the latency
Transport Layer ndash TCP
65
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
2
1
RTTRO
P
encyMinimumLat
Latency
Slow start will not significantly increase latency if RTT ltlt OR
Transport Layer ndash TCP
66
BB
httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features
Transport Layer ndash TCP
64
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
11log2
RS
RTTQ
bull The actual number of times that the server stalls is
P = min Q K-1
R
S
R
SRTTP
R
ORTT P )12(2Latency
bull Closed-form expression for the latency
Transport Layer ndash TCP
65
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
2
1
RTTRO
P
encyMinimumLat
Latency
Slow start will not significantly increase latency if RTT ltlt OR
Transport Layer ndash TCP
66
BB
httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features
Transport Layer ndash TCP
65
BB
bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments
Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW
2
1
RTTRO
P
encyMinimumLat
Latency
Slow start will not significantly increase latency if RTT ltlt OR
Transport Layer ndash TCP
66
BB
httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features