Transport Layer
CS 3516 ndash Computer Networks
Chapter 3 Transport LayerGoalsbull Understand
principles behind transport layer servicesndash Multiplexing
demultiplexingndash Reliable data
transferndash Flow controlndash Congestion control
bull Learn about transport layer protocols in the Internetndash UDP connectionless
transportndash TCP connection-oriented
transportndash TCP congestion control
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Transport Services and Protocols
bull Provide logical communication between app processes running on different hosts
bull Transport protocols run in end systems ndash send side breaks app
messages into segments passes to network layer
ndash receive side reassembles segments into messages passes to app layer
bull More than one transport protocol available to appsndash Internet TCP and UDP
applicationtransportnetworkdata linkphysical
applicationtransportnetworkdata linkphysical
logical end-end transport
Transport vs Network layer
bull network layer logical communication between hosts
bull transport layer logical communication between processes ndash relies on enhances network layer services
Household analogy12 kids sending letters to
12 kidsbull processes = kidsbull app messages = letters
in envelopesbull hosts = housesbull transport protocol = Ann
and Bill (collect mail from siblings)
bull network-layer protocol = postal service
Internet Transport-layer Protocols
bull reliable in-order delivery (TCP)ndash congestion control ndash flow controlndash connection setup
bull unreliable unordered delivery UDPndash no-frills extension of
ldquobest-effortrdquo IP
bull services not available ndash delay guaranteesndash bandwidth guarantees
applicationtransportnetworkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysical
applicationtransportnetworkdata linkphysical
logical end-end transport
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
UDP User Datagram Protocol [RFC 768]
bull ldquono frillsrdquo ldquobare bonesrdquo Internet transport protocol
bull ldquobest effortrdquo service UDP segments may bendash lostndash delivered out of order
to appbull connectionless
ndash no handshaking between UDP sender receiver
ndash each UDP segment handled independently of others
Why is there a UDPbull no connection
establishment (which can add delay)
bull simple no connection state at sender receiver
bull small segment headerbull no congestion control
UDP can blast away as fast as desired
UDP more
bull Often used for streaming (videoaudio) or game appsndash loss tolerantndash rate sensitive
bull other UDP usesndash DNSndash SNMP
bull reliable transfer over UDP add reliability at application layerndash application-specific
error recovery
source port dest port
32 bits
Applicationdata (message)
UDP segment format
length checksumLength in
bytes of UDPsegmentincluding
header
UDP checksum
Senderbull treat segment contents
as sequence of 16-bit integers
bull checksum addition (1rsquos complement sum) of segment contents
bull sender puts checksum value into UDP checksum field
Receiverbull compute checksum of
received segmentbull check if computed
checksum equals checksum field valuendash NO - error detectedndash YES - no error detected
But maybe errors nonetheless More later hellip
Goal detect ldquoerrorsrdquo (eg flipped bits) in transmitted segment
Internet Checksum Example
bull Example add two 16-bit integers
1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1
wraparound
sum
checksum
bull At receiver add 2 integers and checksum hellip should be all 1rsquos If not bit error (correction next)
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Principles of Reliable data transferbull important in app transport link layers
bull top-10 list of important networking topics
bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
Principles of Reliable data transferbull important in app transport link layers
bull top-10 list of important networking topics
bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
Principles of Reliable Data Transferbull important in app transport link layersbull top-10 list of important networking topics
bull Characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
(Zoom
next
slid
e)
Reliable Data Transfer Getting Started
sendside
receiveside
rdt_send() called from above (eg by app) Passed data to deliver to receiver upper layer
udt_send() called by rdtto transfer packet over unreliable channel to receiver
rdt_rcv() called when packet arrives on rcv-side of channel
deliver_data() called by rdt to deliver data to upper
Reliable Data Transfer Getting Started
Wersquollbull Incrementally develop sender receiver
sides of reliable data transfer protocol (rdt)bull Consider only unidirectional data transfer
ndash but control info will flow on both directions
bull Use finite state machines (FSM) to specify sender receiver
state 1 state 2
event causing state transitionactions taken on state transition
state when in this ldquostaterdquo next state uniquely determined by next
event
eventactions
Rdt10 Reliable Transfer over a Reliable Channel
bull Underlying channel perfectly reliablendash no bit errorsndash no loss of packets
bull Separate FSMs for sender receiverndash sender sends data into underlying channelndash receiver read data from underlying channel
Wait for call from above packet = make_pkt(data)
udt_send(packet)
rdt_send(data)
extract (packetdata)deliver_data(data)
Wait for call from
below
rdt_rcv(packet)
sender receiverEasy
What if Taking a Message over Phone
bull Message is clearbull Message is garbled
What if Taking a Message over Phone
bull Message is clearndash Ok
bull Message is garbledndash Ask to repeatndash May not need whole message
bull In networks called Automatic Repeat reQuest (ARQ)ndash Need error detectionndash Receiver feedbackndash Retransmission
Rdt20 Channel with Bit Errors (no Loss)
bull Underlying channel may flip bits in packetndash Checksum to detect bit errors
bull The question how to recover from errorsndash acknowledgements (ACKs) receiver explicitly tells
sender that pkt received OKndash negative acknowledgements (NAKs) receiver
explicitly tells sender that pkt had errorsndash Sender retransmits pkt on receipt of NAK
bull New mechanisms in rdt20 (beyond rdt10)ndash Error detectionndash Receiver feedback control msgs (ACKNAK)
rcvrsender
Rdt20 FSM Specification
Wait for call from above
sndpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from below
sender
receiverrdt_send(data)
L
Rdt20 Operation with No Errors
Wait for call from above
snkpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from
below
rdt_send(data)
L
sender
receiver
Rdt20 Error Scenario
Wait for call from above
snkpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from
below
rdt_send(data)
L
sender
receiver
Sender sends one packet then waits for receiver response
stop and wait
Rdt20 Has a Fatal Flaw
Wait for call from above
sndpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from below
sender
receiverrdt_send(data)
L
Rdt20 Has a Fatal Flaw
What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicatebull How to handle duplicates
Rdt20 Has a Fatal Flaw
What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicate
Handling duplicates bull Sender retransmits
current pkt if ACKNAK garbled
bull Sender adds sequence number to each pktndash Can use 1 bit (for now)
bull receiver discards (doesnrsquot deliver up) duplicate pkt
Rdt21 Sender Handles Garbled ACKNAKs
Wait for call 0 from
above
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)
rdt_send(data)
Wait for ACK or NAK 0
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) ||isNAK(rcvpkt) )
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)
rdt_send(data)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)
Wait for call 1 from
above
Wait for ACK or NAK 1
LL
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)
Rdt21 Receiver Handles Garbled ACKNAKs
Wait for 0 from below
sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq0(rcvpkt)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
ampamp has_seq1(rcvpkt) extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
Wait for 1 from below
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq0(rcvpkt)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq1(rcvpkt)
sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
Rdt21 Discussion
Senderbull seq added to pktbull two seq rsquos (01) will sufficebull must check if received ACKNAK corrupted bull twice as many states
ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq
Receiverbull must check if received packet is duplicate
ndash state indicates whether 0 or 1 is expected pkt seq
bull note receiver can not know if its last ACKNAK received OK at sender
Rdt22 a NAK-free Protocol
bull Reduce type of response ACK onlybull Same functionality as rdt21 using ACKs onlybull Instead of NAK receiver sends ACK for last
pkt received OKndash receiver must explicitly include seq of pkt being
ACKed
bull Duplicate ACK at sender results in same action as NAK retransmit current pkt
rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))
udt_send(sndpkt)
Rdt22 Sender amp Receiver Fragments
Wait for call 0 from
above
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)
rdt_send(data)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
Wait for ACK 0
sender FSMfragment
Wait for 0 from below
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)
receiver FSMfragment
L
Rdt30 Channels with Errors and Loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of
help but not enough
How to determine if a packet is lost
Rdt30 Channels with Errors and Loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of
help but not enough
Approach sender waits ldquoreasonablerdquo amount of time for ACK
bull Retransmits if no ACK received in this time
bull If pkt (or ACK) just delayed (not lost)ndash Retransmission will be
duplicate but use of seq rsquos already handles this
ndash Receiver must specify seq of pkt being ACKed
bull Requires countdown timer
Rdt30 Sender
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer
rdt_send(data)
Wait for
ACK0
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )
Wait for call 1 from
above
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer
rdt_send(data)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)
stop_timerstop_timer
udt_send(sndpkt)start_timer
timeout
udt_send(sndpkt)start_timer
timeout
rdt_rcv(rcvpkt)
Wait for call 0from
above
Wait for
ACK1
Lrdt_rcv(rcvpkt)
LL
L
Rdt30 in Action
Rdt30 in Action
Performance of Rdt30
bull Rdt30 works but performance stinkshellipbull ex 1 Gbps link 15 ms prop delay 8000 bit
packet
U sender utilization ndash fraction of time sender busy sending
U sender
= 008
30008 = 000027
microseconds
L R
RTT + L R =
bull 1KB pkt every 30 msec -gt 33kBsec throughput over 1 Gbps link
bull Network protocol limits use of physical resources
dsmicrosecon8bps10
bits80009
R
Ldtrans
(Picture next slide)
Rdt30 Stop-and-Wait Operation
first packet bit transmitted t = 0
sender
receiver
RTT
last packet bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
U sender
= 008
30008 = 000027
microseconds
L R
RTT + L R =
Pipelined Protocols
Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash Range of sequence numbers must be increasedndash Need buffering at sender andor receiver
Pipelining Increased Utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
U sender
= 024
30008 = 00008
microseconds
3 L R
RTT + L R =
Increase utilizationby a factor of 3
bull Two generic forms of pipelined protocols go-Back-N selective repeat
Pipelining Protocols
Go-back-N overviewbull sender up to N
unACKed pkts in pipeline
bull receiver only sends cumulative ACKsndash doesnrsquot ACK pkt if
therersquos a gap
bull sender has timer for oldest unACKed pktndash if timer expires
retransmit all unACKed packets
Selective Repeat overviewbull sender up to N
unACKed packets in pipeline
bull receiver ACKs individual pkts
bull sender maintains timer for each unACKed pktndash if timer expires
retransmit only unACKed packet
Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unACKed pkts allowed
bull ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)
bull Timer for each in-flight pktbull Timeout(n) retransmit pkt n and all higher seq pkts in
window
GBN Sender Extended FSM
Waitstart_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])
timeout
rdt_send(data)
if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)
base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
base=1nextseqnum=1
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
L
GBN Receiver Extended FSM
ACK-only always send ACK for correctly-received pkt with highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum
bull out-of-order pkt ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK pkt with highest in-order seq
Wait
udt_send(sndpkt)
default
rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++
expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)
L
GBN inaction
GBN Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsgo-back-nindexhtml
Selective Repeatbull Receiver individually acknowledges all
correctly received pktsndash Buffers pkts as needed for eventual in-order
delivery to upper layer
bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt
bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts
Selective Repeat sender receiver windows
Selective Repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next
unACKed seq
senderpkt n in [rcvbase rcvbase+N-
1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also
deliver buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1]
bull ACK(n)
otherwise bull ignore
receiver
Transport Layer 3-52
Selective Repeat in Action
Selective Repeat
DilemmaExample bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
SR Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Overview RFCs 793 1122 1323 2018 2581
bull full duplex datandash bi-directional data flow
in same connectionndash MSS maximum
segment size
bull connection-oriented ndash handshaking
(exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not
overwhelm receiver
bull point-to-pointndash one sender one
receiver
bull reliable in-order byte steamndash no ldquomessage
boundariesrdquo
bull pipelinedndash TCP congestion and
flow control set window size
bull send amp receive buffers
socketdoor
T C Psend buffer
T C Preceive buffer
socketdoor
segm ent
applicationwrites data
applicationreads data
TCP Segment Structure
source port dest port
32 bits
applicationdata (variable length)
sequence number
acknowledgement number
Receive window
Urg data pointerchecksumFSRPAU
headlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
TCP Seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next
byte expected from other side
ndash cumulative ACKQ how receiver
handles out-of-order segmentsndash A TCP spec
doesnrsquot say up to implementer
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
UsertypeslsquoCrsquo
host ACKsreceipt of echoedlsquoCrsquo
timesimple telnet scenario
host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTT
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTTbull SampleRTT measured time
from segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
bull Exponential weighted moving averagebull influence of past sample decreases exponentially
fastbull typical value = 18th (or 0125)
Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety margin
bull First estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP reliable data transfer
bull TCP creates rdt service on top of IPrsquos unreliable service
bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs
bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control
TCP Sender EventsData rcvd from appbull Create segment with
seq bull seq is byte-stream
number of first data byte in segment
bull Start timer if not already running (think of timer as for oldest unACKed segment)
bull Expiration interval TimeOutInterval
Timeoutbull retransmit segment
that caused timeoutbull restart timer ACK rcvdbull If acknowledges
previously unACKed segmentsndash update what is known
to be ACKedndash start timer if there are
outstanding segments
TCP Sender
(simplified)
NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71
y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed
TCP Retransmission Scenarios
Host A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq
=92
tim
eout
ACK=120
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq
=92
tim
eout
SendBase= 100
SendBase= 120
SendBase= 120
Sendbase= 100
TCP Retransmission Scenarios (more)
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
Cumulative ACK scenario
Host B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
Fast Retransmit
bull Time-out period often relatively longndash Long delay before
resending lost packet
bull Detect lost segments via duplicate ACKsndash Sender often sends
many segments back-to-back
ndash If segment lost there will likely be many duplicate ACKs for that segment
bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
Host A
timeo
ut
Host B
time
X
resend seq X2
seq x1seq x2seq x3seq x4seq x5
ACK x1
ACK x1ACK x1ACK x1
tripleduplicate
ACKs
Fast
Retra
nsm
it
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Flow Control
bull Receive side of TCP connection has a receive buffer
bull speed-matching service matching send rate to receiving applicationrsquos drain rate
bull App process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast
flow control
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
TCP Flow Control How it Works
(suppose TCP receiver discards out-of-order segments)
bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Receiver advertises unused buffer space by including rwnd value in segment header
bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos
buffer doesnrsquot overflow
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
rwndRcvBuffer
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control
info (eg RcvWindow)bull client connection
initiator Socket clientSocket = new
Socket(ldquohostnamerdquo port) bull server contacted by
client Socket connectionSocket =
welcomeSocketaccept()
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial
seq Step 3 client receives
SYNACK replies with ACK segment which may contain data
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer 3-81
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Principles of Congestion Control
Congestionbull Informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull Different from flow controlbull Manifestations
ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)
bull A ldquotop-10rdquo problem
Causescosts of Congestion Scenario 1
bull Two senders two receivers
bull One router infinite buffers
bull No retransmission
bull Large delays when congested
bull Maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
Causescosts of Congestion Scenario 2
bull One router finite buffers bull Sender retransmission of lost packet
finite shared output link
buffers
Host A
lin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of congestion Scenario 2
bull Always (goodput)
bull ldquoPerfectrdquo retransmission only when loss
bull Retransmission of delayed (not lost) packet makes
larger (than perfect case) for same
lin
lout
=
lin
lout
gtl
inl
out
ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of
pkt
R2
R2lin
l ou
t
b
R2
R2lin
l ou
t
a
R2
R2lin
l ou
t
c
R4
R3
Causescosts of Congestion Scenario 3
bull Four sendersbull Multihop pathsbull Timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of Congestion Scenario 3
Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Host A
Host B
lo
u
t
Approaches towards congestion control
End-end congestion control
bull No explicit feedback from network
bull Congestion inferred from end-system observed loss delay
bull Approach taken by TCP
Network-assisted congestion control
bull Routers provide feedback to end systemsndash Single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash Explicit rate sender should send at
Broadly
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Congestion Control
bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level
bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)
network not congested so increase sending rate
- lost segment assume loss due to congested network so decrease sending rate
TCP Congestion Control Bandwidth Probing
bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since
available bandwidth is changing depending on other connections in network)
ACKs being received so increase rate
X
X
XX
X loss so decrease rate
send
ing
rate
time
bull Q how fast to increasedecrease- details to follow
TCPrsquosldquosawtoothrdquobehavior
TCP Congestion Control detailsbull sender limits rate by limiting
number of unACKed bytes ldquoin pipelinerdquo
ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)
bull roughly
bull cwnd is dynamic function of perceived network congestion
rate = cwnd
RTT bytessec
LastByteSent-LastByteAcked cwnd
cwndbytes
RTT
ACK(s)
TCP Congestion Control more details
segment loss event reducing cwnd
bull timeout no response from receiverndash cut cwnd to 1
bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less
aggressively than on timeout
ACK received increase cwnd
bull slowstart phase - increase exponentially
fast (despite name) at connection start or following timeout
bull congestion avoidance - increase linearly
TCP Slow Start
bull when connection begins cwnd = 1 MSSndash example MSS = 500
bytes amp RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp
up to respectable ratebull increase rate exponentially
until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transitioning intoout of slowstart
ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2
ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to
congestion avoidance phase
slow start timeout
ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed
new ACKdupACKcount++
duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion
avoidance
TCP Congestion Avoidance
bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1
MSS per RTT ndash approach possible
congestion slower than in slowstart
ndash implementation cwnd = cwnd + MSScwnd for each ACK received
bull ACKs increase cwnd by 1 MSS per RTT additive increase
bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease
AIMD
AIMD Additive IncreaseMultiplicative Decrease
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
Chapter 3 Transport LayerGoalsbull Understand
principles behind transport layer servicesndash Multiplexing
demultiplexingndash Reliable data
transferndash Flow controlndash Congestion control
bull Learn about transport layer protocols in the Internetndash UDP connectionless
transportndash TCP connection-oriented
transportndash TCP congestion control
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Transport Services and Protocols
bull Provide logical communication between app processes running on different hosts
bull Transport protocols run in end systems ndash send side breaks app
messages into segments passes to network layer
ndash receive side reassembles segments into messages passes to app layer
bull More than one transport protocol available to appsndash Internet TCP and UDP
applicationtransportnetworkdata linkphysical
applicationtransportnetworkdata linkphysical
logical end-end transport
Transport vs Network layer
bull network layer logical communication between hosts
bull transport layer logical communication between processes ndash relies on enhances network layer services
Household analogy12 kids sending letters to
12 kidsbull processes = kidsbull app messages = letters
in envelopesbull hosts = housesbull transport protocol = Ann
and Bill (collect mail from siblings)
bull network-layer protocol = postal service
Internet Transport-layer Protocols
bull reliable in-order delivery (TCP)ndash congestion control ndash flow controlndash connection setup
bull unreliable unordered delivery UDPndash no-frills extension of
ldquobest-effortrdquo IP
bull services not available ndash delay guaranteesndash bandwidth guarantees
applicationtransportnetworkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysical
applicationtransportnetworkdata linkphysical
logical end-end transport
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
UDP User Datagram Protocol [RFC 768]
bull ldquono frillsrdquo ldquobare bonesrdquo Internet transport protocol
bull ldquobest effortrdquo service UDP segments may bendash lostndash delivered out of order
to appbull connectionless
ndash no handshaking between UDP sender receiver
ndash each UDP segment handled independently of others
Why is there a UDPbull no connection
establishment (which can add delay)
bull simple no connection state at sender receiver
bull small segment headerbull no congestion control
UDP can blast away as fast as desired
UDP more
bull Often used for streaming (videoaudio) or game appsndash loss tolerantndash rate sensitive
bull other UDP usesndash DNSndash SNMP
bull reliable transfer over UDP add reliability at application layerndash application-specific
error recovery
source port dest port
32 bits
Applicationdata (message)
UDP segment format
length checksumLength in
bytes of UDPsegmentincluding
header
UDP checksum
Senderbull treat segment contents
as sequence of 16-bit integers
bull checksum addition (1rsquos complement sum) of segment contents
bull sender puts checksum value into UDP checksum field
Receiverbull compute checksum of
received segmentbull check if computed
checksum equals checksum field valuendash NO - error detectedndash YES - no error detected
But maybe errors nonetheless More later hellip
Goal detect ldquoerrorsrdquo (eg flipped bits) in transmitted segment
Internet Checksum Example
bull Example add two 16-bit integers
1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1
wraparound
sum
checksum
bull At receiver add 2 integers and checksum hellip should be all 1rsquos If not bit error (correction next)
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Principles of Reliable data transferbull important in app transport link layers
bull top-10 list of important networking topics
bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
Principles of Reliable data transferbull important in app transport link layers
bull top-10 list of important networking topics
bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
Principles of Reliable Data Transferbull important in app transport link layersbull top-10 list of important networking topics
bull Characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
(Zoom
next
slid
e)
Reliable Data Transfer Getting Started
sendside
receiveside
rdt_send() called from above (eg by app) Passed data to deliver to receiver upper layer
udt_send() called by rdtto transfer packet over unreliable channel to receiver
rdt_rcv() called when packet arrives on rcv-side of channel
deliver_data() called by rdt to deliver data to upper
Reliable Data Transfer Getting Started
Wersquollbull Incrementally develop sender receiver
sides of reliable data transfer protocol (rdt)bull Consider only unidirectional data transfer
ndash but control info will flow on both directions
bull Use finite state machines (FSM) to specify sender receiver
state 1 state 2
event causing state transitionactions taken on state transition
state when in this ldquostaterdquo next state uniquely determined by next
event
eventactions
Rdt10 Reliable Transfer over a Reliable Channel
bull Underlying channel perfectly reliablendash no bit errorsndash no loss of packets
bull Separate FSMs for sender receiverndash sender sends data into underlying channelndash receiver read data from underlying channel
Wait for call from above packet = make_pkt(data)
udt_send(packet)
rdt_send(data)
extract (packetdata)deliver_data(data)
Wait for call from
below
rdt_rcv(packet)
sender receiverEasy
What if Taking a Message over Phone
bull Message is clearbull Message is garbled
What if Taking a Message over Phone
bull Message is clearndash Ok
bull Message is garbledndash Ask to repeatndash May not need whole message
bull In networks called Automatic Repeat reQuest (ARQ)ndash Need error detectionndash Receiver feedbackndash Retransmission
Rdt20 Channel with Bit Errors (no Loss)
bull Underlying channel may flip bits in packetndash Checksum to detect bit errors
bull The question how to recover from errorsndash acknowledgements (ACKs) receiver explicitly tells
sender that pkt received OKndash negative acknowledgements (NAKs) receiver
explicitly tells sender that pkt had errorsndash Sender retransmits pkt on receipt of NAK
bull New mechanisms in rdt20 (beyond rdt10)ndash Error detectionndash Receiver feedback control msgs (ACKNAK)
rcvrsender
Rdt20 FSM Specification
Wait for call from above
sndpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from below
sender
receiverrdt_send(data)
L
Rdt20 Operation with No Errors
Wait for call from above
snkpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from
below
rdt_send(data)
L
sender
receiver
Rdt20 Error Scenario
Wait for call from above
snkpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from
below
rdt_send(data)
L
sender
receiver
Sender sends one packet then waits for receiver response
stop and wait
Rdt20 Has a Fatal Flaw
Wait for call from above
sndpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from below
sender
receiverrdt_send(data)
L
Rdt20 Has a Fatal Flaw
What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicatebull How to handle duplicates
Rdt20 Has a Fatal Flaw
What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicate
Handling duplicates bull Sender retransmits
current pkt if ACKNAK garbled
bull Sender adds sequence number to each pktndash Can use 1 bit (for now)
bull receiver discards (doesnrsquot deliver up) duplicate pkt
Rdt21 Sender Handles Garbled ACKNAKs
Wait for call 0 from
above
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)
rdt_send(data)
Wait for ACK or NAK 0
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) ||isNAK(rcvpkt) )
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)
rdt_send(data)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)
Wait for call 1 from
above
Wait for ACK or NAK 1
LL
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)
Rdt21 Receiver Handles Garbled ACKNAKs
Wait for 0 from below
sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq0(rcvpkt)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
ampamp has_seq1(rcvpkt) extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
Wait for 1 from below
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq0(rcvpkt)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq1(rcvpkt)
sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
Rdt21 Discussion
Senderbull seq added to pktbull two seq rsquos (01) will sufficebull must check if received ACKNAK corrupted bull twice as many states
ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq
Receiverbull must check if received packet is duplicate
ndash state indicates whether 0 or 1 is expected pkt seq
bull note receiver can not know if its last ACKNAK received OK at sender
Rdt22 a NAK-free Protocol
bull Reduce type of response ACK onlybull Same functionality as rdt21 using ACKs onlybull Instead of NAK receiver sends ACK for last
pkt received OKndash receiver must explicitly include seq of pkt being
ACKed
bull Duplicate ACK at sender results in same action as NAK retransmit current pkt
rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))
udt_send(sndpkt)
Rdt22 Sender amp Receiver Fragments
Wait for call 0 from
above
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)
rdt_send(data)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
Wait for ACK 0
sender FSMfragment
Wait for 0 from below
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)
receiver FSMfragment
L
Rdt30 Channels with Errors and Loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of
help but not enough
How to determine if a packet is lost
Rdt30 Channels with Errors and Loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of
help but not enough
Approach sender waits ldquoreasonablerdquo amount of time for ACK
bull Retransmits if no ACK received in this time
bull If pkt (or ACK) just delayed (not lost)ndash Retransmission will be
duplicate but use of seq rsquos already handles this
ndash Receiver must specify seq of pkt being ACKed
bull Requires countdown timer
Rdt30 Sender
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer
rdt_send(data)
Wait for
ACK0
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )
Wait for call 1 from
above
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer
rdt_send(data)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)
stop_timerstop_timer
udt_send(sndpkt)start_timer
timeout
udt_send(sndpkt)start_timer
timeout
rdt_rcv(rcvpkt)
Wait for call 0from
above
Wait for
ACK1
Lrdt_rcv(rcvpkt)
LL
L
Rdt30 in Action
Rdt30 in Action
Performance of Rdt30
bull Rdt30 works but performance stinkshellipbull ex 1 Gbps link 15 ms prop delay 8000 bit
packet
U sender utilization ndash fraction of time sender busy sending
U sender
= 008
30008 = 000027
microseconds
L R
RTT + L R =
bull 1KB pkt every 30 msec -gt 33kBsec throughput over 1 Gbps link
bull Network protocol limits use of physical resources
dsmicrosecon8bps10
bits80009
R
Ldtrans
(Picture next slide)
Rdt30 Stop-and-Wait Operation
first packet bit transmitted t = 0
sender
receiver
RTT
last packet bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
U sender
= 008
30008 = 000027
microseconds
L R
RTT + L R =
Pipelined Protocols
Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash Range of sequence numbers must be increasedndash Need buffering at sender andor receiver
Pipelining Increased Utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
U sender
= 024
30008 = 00008
microseconds
3 L R
RTT + L R =
Increase utilizationby a factor of 3
bull Two generic forms of pipelined protocols go-Back-N selective repeat
Pipelining Protocols
Go-back-N overviewbull sender up to N
unACKed pkts in pipeline
bull receiver only sends cumulative ACKsndash doesnrsquot ACK pkt if
therersquos a gap
bull sender has timer for oldest unACKed pktndash if timer expires
retransmit all unACKed packets
Selective Repeat overviewbull sender up to N
unACKed packets in pipeline
bull receiver ACKs individual pkts
bull sender maintains timer for each unACKed pktndash if timer expires
retransmit only unACKed packet
Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unACKed pkts allowed
bull ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)
bull Timer for each in-flight pktbull Timeout(n) retransmit pkt n and all higher seq pkts in
window
GBN Sender Extended FSM
Waitstart_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])
timeout
rdt_send(data)
if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)
base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
base=1nextseqnum=1
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
L
GBN Receiver Extended FSM
ACK-only always send ACK for correctly-received pkt with highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum
bull out-of-order pkt ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK pkt with highest in-order seq
Wait
udt_send(sndpkt)
default
rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++
expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)
L
GBN inaction
GBN Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsgo-back-nindexhtml
Selective Repeatbull Receiver individually acknowledges all
correctly received pktsndash Buffers pkts as needed for eventual in-order
delivery to upper layer
bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt
bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts
Selective Repeat sender receiver windows
Selective Repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next
unACKed seq
senderpkt n in [rcvbase rcvbase+N-
1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also
deliver buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1]
bull ACK(n)
otherwise bull ignore
receiver
Transport Layer 3-52
Selective Repeat in Action
Selective Repeat
DilemmaExample bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
SR Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Overview RFCs 793 1122 1323 2018 2581
bull full duplex datandash bi-directional data flow
in same connectionndash MSS maximum
segment size
bull connection-oriented ndash handshaking
(exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not
overwhelm receiver
bull point-to-pointndash one sender one
receiver
bull reliable in-order byte steamndash no ldquomessage
boundariesrdquo
bull pipelinedndash TCP congestion and
flow control set window size
bull send amp receive buffers
socketdoor
T C Psend buffer
T C Preceive buffer
socketdoor
segm ent
applicationwrites data
applicationreads data
TCP Segment Structure
source port dest port
32 bits
applicationdata (variable length)
sequence number
acknowledgement number
Receive window
Urg data pointerchecksumFSRPAU
headlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
TCP Seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next
byte expected from other side
ndash cumulative ACKQ how receiver
handles out-of-order segmentsndash A TCP spec
doesnrsquot say up to implementer
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
UsertypeslsquoCrsquo
host ACKsreceipt of echoedlsquoCrsquo
timesimple telnet scenario
host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTT
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTTbull SampleRTT measured time
from segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
bull Exponential weighted moving averagebull influence of past sample decreases exponentially
fastbull typical value = 18th (or 0125)
Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety margin
bull First estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP reliable data transfer
bull TCP creates rdt service on top of IPrsquos unreliable service
bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs
bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control
TCP Sender EventsData rcvd from appbull Create segment with
seq bull seq is byte-stream
number of first data byte in segment
bull Start timer if not already running (think of timer as for oldest unACKed segment)
bull Expiration interval TimeOutInterval
Timeoutbull retransmit segment
that caused timeoutbull restart timer ACK rcvdbull If acknowledges
previously unACKed segmentsndash update what is known
to be ACKedndash start timer if there are
outstanding segments
TCP Sender
(simplified)
NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71
y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed
TCP Retransmission Scenarios
Host A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq
=92
tim
eout
ACK=120
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq
=92
tim
eout
SendBase= 100
SendBase= 120
SendBase= 120
Sendbase= 100
TCP Retransmission Scenarios (more)
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
Cumulative ACK scenario
Host B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
Fast Retransmit
bull Time-out period often relatively longndash Long delay before
resending lost packet
bull Detect lost segments via duplicate ACKsndash Sender often sends
many segments back-to-back
ndash If segment lost there will likely be many duplicate ACKs for that segment
bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
Host A
timeo
ut
Host B
time
X
resend seq X2
seq x1seq x2seq x3seq x4seq x5
ACK x1
ACK x1ACK x1ACK x1
tripleduplicate
ACKs
Fast
Retra
nsm
it
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Flow Control
bull Receive side of TCP connection has a receive buffer
bull speed-matching service matching send rate to receiving applicationrsquos drain rate
bull App process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast
flow control
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
TCP Flow Control How it Works
(suppose TCP receiver discards out-of-order segments)
bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Receiver advertises unused buffer space by including rwnd value in segment header
bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos
buffer doesnrsquot overflow
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
rwndRcvBuffer
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control
info (eg RcvWindow)bull client connection
initiator Socket clientSocket = new
Socket(ldquohostnamerdquo port) bull server contacted by
client Socket connectionSocket =
welcomeSocketaccept()
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial
seq Step 3 client receives
SYNACK replies with ACK segment which may contain data
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer 3-81
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Principles of Congestion Control
Congestionbull Informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull Different from flow controlbull Manifestations
ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)
bull A ldquotop-10rdquo problem
Causescosts of Congestion Scenario 1
bull Two senders two receivers
bull One router infinite buffers
bull No retransmission
bull Large delays when congested
bull Maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
Causescosts of Congestion Scenario 2
bull One router finite buffers bull Sender retransmission of lost packet
finite shared output link
buffers
Host A
lin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of congestion Scenario 2
bull Always (goodput)
bull ldquoPerfectrdquo retransmission only when loss
bull Retransmission of delayed (not lost) packet makes
larger (than perfect case) for same
lin
lout
=
lin
lout
gtl
inl
out
ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of
pkt
R2
R2lin
l ou
t
b
R2
R2lin
l ou
t
a
R2
R2lin
l ou
t
c
R4
R3
Causescosts of Congestion Scenario 3
bull Four sendersbull Multihop pathsbull Timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of Congestion Scenario 3
Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Host A
Host B
lo
u
t
Approaches towards congestion control
End-end congestion control
bull No explicit feedback from network
bull Congestion inferred from end-system observed loss delay
bull Approach taken by TCP
Network-assisted congestion control
bull Routers provide feedback to end systemsndash Single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash Explicit rate sender should send at
Broadly
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Congestion Control
bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level
bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)
network not congested so increase sending rate
- lost segment assume loss due to congested network so decrease sending rate
TCP Congestion Control Bandwidth Probing
bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since
available bandwidth is changing depending on other connections in network)
ACKs being received so increase rate
X
X
XX
X loss so decrease rate
send
ing
rate
time
bull Q how fast to increasedecrease- details to follow
TCPrsquosldquosawtoothrdquobehavior
TCP Congestion Control detailsbull sender limits rate by limiting
number of unACKed bytes ldquoin pipelinerdquo
ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)
bull roughly
bull cwnd is dynamic function of perceived network congestion
rate = cwnd
RTT bytessec
LastByteSent-LastByteAcked cwnd
cwndbytes
RTT
ACK(s)
TCP Congestion Control more details
segment loss event reducing cwnd
bull timeout no response from receiverndash cut cwnd to 1
bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less
aggressively than on timeout
ACK received increase cwnd
bull slowstart phase - increase exponentially
fast (despite name) at connection start or following timeout
bull congestion avoidance - increase linearly
TCP Slow Start
bull when connection begins cwnd = 1 MSSndash example MSS = 500
bytes amp RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp
up to respectable ratebull increase rate exponentially
until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transitioning intoout of slowstart
ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2
ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to
congestion avoidance phase
slow start timeout
ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed
new ACKdupACKcount++
duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion
avoidance
TCP Congestion Avoidance
bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1
MSS per RTT ndash approach possible
congestion slower than in slowstart
ndash implementation cwnd = cwnd + MSScwnd for each ACK received
bull ACKs increase cwnd by 1 MSS per RTT additive increase
bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease
AIMD
AIMD Additive IncreaseMultiplicative Decrease
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Transport Services and Protocols
bull Provide logical communication between app processes running on different hosts
bull Transport protocols run in end systems ndash send side breaks app
messages into segments passes to network layer
ndash receive side reassembles segments into messages passes to app layer
bull More than one transport protocol available to appsndash Internet TCP and UDP
applicationtransportnetworkdata linkphysical
applicationtransportnetworkdata linkphysical
logical end-end transport
Transport vs Network layer
bull network layer logical communication between hosts
bull transport layer logical communication between processes ndash relies on enhances network layer services
Household analogy12 kids sending letters to
12 kidsbull processes = kidsbull app messages = letters
in envelopesbull hosts = housesbull transport protocol = Ann
and Bill (collect mail from siblings)
bull network-layer protocol = postal service
Internet Transport-layer Protocols
bull reliable in-order delivery (TCP)ndash congestion control ndash flow controlndash connection setup
bull unreliable unordered delivery UDPndash no-frills extension of
ldquobest-effortrdquo IP
bull services not available ndash delay guaranteesndash bandwidth guarantees
applicationtransportnetworkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysical
applicationtransportnetworkdata linkphysical
logical end-end transport
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
UDP User Datagram Protocol [RFC 768]
bull ldquono frillsrdquo ldquobare bonesrdquo Internet transport protocol
bull ldquobest effortrdquo service UDP segments may bendash lostndash delivered out of order
to appbull connectionless
ndash no handshaking between UDP sender receiver
ndash each UDP segment handled independently of others
Why is there a UDPbull no connection
establishment (which can add delay)
bull simple no connection state at sender receiver
bull small segment headerbull no congestion control
UDP can blast away as fast as desired
UDP more
bull Often used for streaming (videoaudio) or game appsndash loss tolerantndash rate sensitive
bull other UDP usesndash DNSndash SNMP
bull reliable transfer over UDP add reliability at application layerndash application-specific
error recovery
source port dest port
32 bits
Applicationdata (message)
UDP segment format
length checksumLength in
bytes of UDPsegmentincluding
header
UDP checksum
Senderbull treat segment contents
as sequence of 16-bit integers
bull checksum addition (1rsquos complement sum) of segment contents
bull sender puts checksum value into UDP checksum field
Receiverbull compute checksum of
received segmentbull check if computed
checksum equals checksum field valuendash NO - error detectedndash YES - no error detected
But maybe errors nonetheless More later hellip
Goal detect ldquoerrorsrdquo (eg flipped bits) in transmitted segment
Internet Checksum Example
bull Example add two 16-bit integers
1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1
wraparound
sum
checksum
bull At receiver add 2 integers and checksum hellip should be all 1rsquos If not bit error (correction next)
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Principles of Reliable data transferbull important in app transport link layers
bull top-10 list of important networking topics
bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
Principles of Reliable data transferbull important in app transport link layers
bull top-10 list of important networking topics
bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
Principles of Reliable Data Transferbull important in app transport link layersbull top-10 list of important networking topics
bull Characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
(Zoom
next
slid
e)
Reliable Data Transfer Getting Started
sendside
receiveside
rdt_send() called from above (eg by app) Passed data to deliver to receiver upper layer
udt_send() called by rdtto transfer packet over unreliable channel to receiver
rdt_rcv() called when packet arrives on rcv-side of channel
deliver_data() called by rdt to deliver data to upper
Reliable Data Transfer Getting Started
Wersquollbull Incrementally develop sender receiver
sides of reliable data transfer protocol (rdt)bull Consider only unidirectional data transfer
ndash but control info will flow on both directions
bull Use finite state machines (FSM) to specify sender receiver
state 1 state 2
event causing state transitionactions taken on state transition
state when in this ldquostaterdquo next state uniquely determined by next
event
eventactions
Rdt10 Reliable Transfer over a Reliable Channel
bull Underlying channel perfectly reliablendash no bit errorsndash no loss of packets
bull Separate FSMs for sender receiverndash sender sends data into underlying channelndash receiver read data from underlying channel
Wait for call from above packet = make_pkt(data)
udt_send(packet)
rdt_send(data)
extract (packetdata)deliver_data(data)
Wait for call from
below
rdt_rcv(packet)
sender receiverEasy
What if Taking a Message over Phone
bull Message is clearbull Message is garbled
What if Taking a Message over Phone
bull Message is clearndash Ok
bull Message is garbledndash Ask to repeatndash May not need whole message
bull In networks called Automatic Repeat reQuest (ARQ)ndash Need error detectionndash Receiver feedbackndash Retransmission
Rdt20 Channel with Bit Errors (no Loss)
bull Underlying channel may flip bits in packetndash Checksum to detect bit errors
bull The question how to recover from errorsndash acknowledgements (ACKs) receiver explicitly tells
sender that pkt received OKndash negative acknowledgements (NAKs) receiver
explicitly tells sender that pkt had errorsndash Sender retransmits pkt on receipt of NAK
bull New mechanisms in rdt20 (beyond rdt10)ndash Error detectionndash Receiver feedback control msgs (ACKNAK)
rcvrsender
Rdt20 FSM Specification
Wait for call from above
sndpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from below
sender
receiverrdt_send(data)
L
Rdt20 Operation with No Errors
Wait for call from above
snkpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from
below
rdt_send(data)
L
sender
receiver
Rdt20 Error Scenario
Wait for call from above
snkpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from
below
rdt_send(data)
L
sender
receiver
Sender sends one packet then waits for receiver response
stop and wait
Rdt20 Has a Fatal Flaw
Wait for call from above
sndpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from below
sender
receiverrdt_send(data)
L
Rdt20 Has a Fatal Flaw
What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicatebull How to handle duplicates
Rdt20 Has a Fatal Flaw
What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicate
Handling duplicates bull Sender retransmits
current pkt if ACKNAK garbled
bull Sender adds sequence number to each pktndash Can use 1 bit (for now)
bull receiver discards (doesnrsquot deliver up) duplicate pkt
Rdt21 Sender Handles Garbled ACKNAKs
Wait for call 0 from
above
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)
rdt_send(data)
Wait for ACK or NAK 0
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) ||isNAK(rcvpkt) )
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)
rdt_send(data)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)
Wait for call 1 from
above
Wait for ACK or NAK 1
LL
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)
Rdt21 Receiver Handles Garbled ACKNAKs
Wait for 0 from below
sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq0(rcvpkt)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
ampamp has_seq1(rcvpkt) extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
Wait for 1 from below
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq0(rcvpkt)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq1(rcvpkt)
sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
Rdt21 Discussion
Senderbull seq added to pktbull two seq rsquos (01) will sufficebull must check if received ACKNAK corrupted bull twice as many states
ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq
Receiverbull must check if received packet is duplicate
ndash state indicates whether 0 or 1 is expected pkt seq
bull note receiver can not know if its last ACKNAK received OK at sender
Rdt22 a NAK-free Protocol
bull Reduce type of response ACK onlybull Same functionality as rdt21 using ACKs onlybull Instead of NAK receiver sends ACK for last
pkt received OKndash receiver must explicitly include seq of pkt being
ACKed
bull Duplicate ACK at sender results in same action as NAK retransmit current pkt
rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))
udt_send(sndpkt)
Rdt22 Sender amp Receiver Fragments
Wait for call 0 from
above
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)
rdt_send(data)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
Wait for ACK 0
sender FSMfragment
Wait for 0 from below
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)
receiver FSMfragment
L
Rdt30 Channels with Errors and Loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of
help but not enough
How to determine if a packet is lost
Rdt30 Channels with Errors and Loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of
help but not enough
Approach sender waits ldquoreasonablerdquo amount of time for ACK
bull Retransmits if no ACK received in this time
bull If pkt (or ACK) just delayed (not lost)ndash Retransmission will be
duplicate but use of seq rsquos already handles this
ndash Receiver must specify seq of pkt being ACKed
bull Requires countdown timer
Rdt30 Sender
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer
rdt_send(data)
Wait for
ACK0
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )
Wait for call 1 from
above
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer
rdt_send(data)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)
stop_timerstop_timer
udt_send(sndpkt)start_timer
timeout
udt_send(sndpkt)start_timer
timeout
rdt_rcv(rcvpkt)
Wait for call 0from
above
Wait for
ACK1
Lrdt_rcv(rcvpkt)
LL
L
Rdt30 in Action
Rdt30 in Action
Performance of Rdt30
bull Rdt30 works but performance stinkshellipbull ex 1 Gbps link 15 ms prop delay 8000 bit
packet
U sender utilization ndash fraction of time sender busy sending
U sender
= 008
30008 = 000027
microseconds
L R
RTT + L R =
bull 1KB pkt every 30 msec -gt 33kBsec throughput over 1 Gbps link
bull Network protocol limits use of physical resources
dsmicrosecon8bps10
bits80009
R
Ldtrans
(Picture next slide)
Rdt30 Stop-and-Wait Operation
first packet bit transmitted t = 0
sender
receiver
RTT
last packet bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
U sender
= 008
30008 = 000027
microseconds
L R
RTT + L R =
Pipelined Protocols
Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash Range of sequence numbers must be increasedndash Need buffering at sender andor receiver
Pipelining Increased Utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
U sender
= 024
30008 = 00008
microseconds
3 L R
RTT + L R =
Increase utilizationby a factor of 3
bull Two generic forms of pipelined protocols go-Back-N selective repeat
Pipelining Protocols
Go-back-N overviewbull sender up to N
unACKed pkts in pipeline
bull receiver only sends cumulative ACKsndash doesnrsquot ACK pkt if
therersquos a gap
bull sender has timer for oldest unACKed pktndash if timer expires
retransmit all unACKed packets
Selective Repeat overviewbull sender up to N
unACKed packets in pipeline
bull receiver ACKs individual pkts
bull sender maintains timer for each unACKed pktndash if timer expires
retransmit only unACKed packet
Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unACKed pkts allowed
bull ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)
bull Timer for each in-flight pktbull Timeout(n) retransmit pkt n and all higher seq pkts in
window
GBN Sender Extended FSM
Waitstart_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])
timeout
rdt_send(data)
if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)
base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
base=1nextseqnum=1
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
L
GBN Receiver Extended FSM
ACK-only always send ACK for correctly-received pkt with highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum
bull out-of-order pkt ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK pkt with highest in-order seq
Wait
udt_send(sndpkt)
default
rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++
expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)
L
GBN inaction
GBN Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsgo-back-nindexhtml
Selective Repeatbull Receiver individually acknowledges all
correctly received pktsndash Buffers pkts as needed for eventual in-order
delivery to upper layer
bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt
bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts
Selective Repeat sender receiver windows
Selective Repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next
unACKed seq
senderpkt n in [rcvbase rcvbase+N-
1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also
deliver buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1]
bull ACK(n)
otherwise bull ignore
receiver
Transport Layer 3-52
Selective Repeat in Action
Selective Repeat
DilemmaExample bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
SR Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Overview RFCs 793 1122 1323 2018 2581
bull full duplex datandash bi-directional data flow
in same connectionndash MSS maximum
segment size
bull connection-oriented ndash handshaking
(exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not
overwhelm receiver
bull point-to-pointndash one sender one
receiver
bull reliable in-order byte steamndash no ldquomessage
boundariesrdquo
bull pipelinedndash TCP congestion and
flow control set window size
bull send amp receive buffers
socketdoor
T C Psend buffer
T C Preceive buffer
socketdoor
segm ent
applicationwrites data
applicationreads data
TCP Segment Structure
source port dest port
32 bits
applicationdata (variable length)
sequence number
acknowledgement number
Receive window
Urg data pointerchecksumFSRPAU
headlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
TCP Seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next
byte expected from other side
ndash cumulative ACKQ how receiver
handles out-of-order segmentsndash A TCP spec
doesnrsquot say up to implementer
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
UsertypeslsquoCrsquo
host ACKsreceipt of echoedlsquoCrsquo
timesimple telnet scenario
host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTT
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTTbull SampleRTT measured time
from segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
bull Exponential weighted moving averagebull influence of past sample decreases exponentially
fastbull typical value = 18th (or 0125)
Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety margin
bull First estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP reliable data transfer
bull TCP creates rdt service on top of IPrsquos unreliable service
bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs
bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control
TCP Sender EventsData rcvd from appbull Create segment with
seq bull seq is byte-stream
number of first data byte in segment
bull Start timer if not already running (think of timer as for oldest unACKed segment)
bull Expiration interval TimeOutInterval
Timeoutbull retransmit segment
that caused timeoutbull restart timer ACK rcvdbull If acknowledges
previously unACKed segmentsndash update what is known
to be ACKedndash start timer if there are
outstanding segments
TCP Sender
(simplified)
NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71
y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed
TCP Retransmission Scenarios
Host A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq
=92
tim
eout
ACK=120
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq
=92
tim
eout
SendBase= 100
SendBase= 120
SendBase= 120
Sendbase= 100
TCP Retransmission Scenarios (more)
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
Cumulative ACK scenario
Host B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
Fast Retransmit
bull Time-out period often relatively longndash Long delay before
resending lost packet
bull Detect lost segments via duplicate ACKsndash Sender often sends
many segments back-to-back
ndash If segment lost there will likely be many duplicate ACKs for that segment
bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
Host A
timeo
ut
Host B
time
X
resend seq X2
seq x1seq x2seq x3seq x4seq x5
ACK x1
ACK x1ACK x1ACK x1
tripleduplicate
ACKs
Fast
Retra
nsm
it
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Flow Control
bull Receive side of TCP connection has a receive buffer
bull speed-matching service matching send rate to receiving applicationrsquos drain rate
bull App process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast
flow control
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
TCP Flow Control How it Works
(suppose TCP receiver discards out-of-order segments)
bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Receiver advertises unused buffer space by including rwnd value in segment header
bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos
buffer doesnrsquot overflow
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
rwndRcvBuffer
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control
info (eg RcvWindow)bull client connection
initiator Socket clientSocket = new
Socket(ldquohostnamerdquo port) bull server contacted by
client Socket connectionSocket =
welcomeSocketaccept()
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial
seq Step 3 client receives
SYNACK replies with ACK segment which may contain data
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer 3-81
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Principles of Congestion Control
Congestionbull Informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull Different from flow controlbull Manifestations
ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)
bull A ldquotop-10rdquo problem
Causescosts of Congestion Scenario 1
bull Two senders two receivers
bull One router infinite buffers
bull No retransmission
bull Large delays when congested
bull Maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
Causescosts of Congestion Scenario 2
bull One router finite buffers bull Sender retransmission of lost packet
finite shared output link
buffers
Host A
lin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of congestion Scenario 2
bull Always (goodput)
bull ldquoPerfectrdquo retransmission only when loss
bull Retransmission of delayed (not lost) packet makes
larger (than perfect case) for same
lin
lout
=
lin
lout
gtl
inl
out
ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of
pkt
R2
R2lin
l ou
t
b
R2
R2lin
l ou
t
a
R2
R2lin
l ou
t
c
R4
R3
Causescosts of Congestion Scenario 3
bull Four sendersbull Multihop pathsbull Timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of Congestion Scenario 3
Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Host A
Host B
lo
u
t
Approaches towards congestion control
End-end congestion control
bull No explicit feedback from network
bull Congestion inferred from end-system observed loss delay
bull Approach taken by TCP
Network-assisted congestion control
bull Routers provide feedback to end systemsndash Single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash Explicit rate sender should send at
Broadly
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Congestion Control
bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level
bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)
network not congested so increase sending rate
- lost segment assume loss due to congested network so decrease sending rate
TCP Congestion Control Bandwidth Probing
bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since
available bandwidth is changing depending on other connections in network)
ACKs being received so increase rate
X
X
XX
X loss so decrease rate
send
ing
rate
time
bull Q how fast to increasedecrease- details to follow
TCPrsquosldquosawtoothrdquobehavior
TCP Congestion Control detailsbull sender limits rate by limiting
number of unACKed bytes ldquoin pipelinerdquo
ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)
bull roughly
bull cwnd is dynamic function of perceived network congestion
rate = cwnd
RTT bytessec
LastByteSent-LastByteAcked cwnd
cwndbytes
RTT
ACK(s)
TCP Congestion Control more details
segment loss event reducing cwnd
bull timeout no response from receiverndash cut cwnd to 1
bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less
aggressively than on timeout
ACK received increase cwnd
bull slowstart phase - increase exponentially
fast (despite name) at connection start or following timeout
bull congestion avoidance - increase linearly
TCP Slow Start
bull when connection begins cwnd = 1 MSSndash example MSS = 500
bytes amp RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp
up to respectable ratebull increase rate exponentially
until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transitioning intoout of slowstart
ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2
ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to
congestion avoidance phase
slow start timeout
ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed
new ACKdupACKcount++
duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion
avoidance
TCP Congestion Avoidance
bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1
MSS per RTT ndash approach possible
congestion slower than in slowstart
ndash implementation cwnd = cwnd + MSScwnd for each ACK received
bull ACKs increase cwnd by 1 MSS per RTT additive increase
bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease
AIMD
AIMD Additive IncreaseMultiplicative Decrease
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
Transport Services and Protocols
bull Provide logical communication between app processes running on different hosts
bull Transport protocols run in end systems ndash send side breaks app
messages into segments passes to network layer
ndash receive side reassembles segments into messages passes to app layer
bull More than one transport protocol available to appsndash Internet TCP and UDP
applicationtransportnetworkdata linkphysical
applicationtransportnetworkdata linkphysical
logical end-end transport
Transport vs Network layer
bull network layer logical communication between hosts
bull transport layer logical communication between processes ndash relies on enhances network layer services
Household analogy12 kids sending letters to
12 kidsbull processes = kidsbull app messages = letters
in envelopesbull hosts = housesbull transport protocol = Ann
and Bill (collect mail from siblings)
bull network-layer protocol = postal service
Internet Transport-layer Protocols
bull reliable in-order delivery (TCP)ndash congestion control ndash flow controlndash connection setup
bull unreliable unordered delivery UDPndash no-frills extension of
ldquobest-effortrdquo IP
bull services not available ndash delay guaranteesndash bandwidth guarantees
applicationtransportnetworkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysical
applicationtransportnetworkdata linkphysical
logical end-end transport
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
UDP User Datagram Protocol [RFC 768]
bull ldquono frillsrdquo ldquobare bonesrdquo Internet transport protocol
bull ldquobest effortrdquo service UDP segments may bendash lostndash delivered out of order
to appbull connectionless
ndash no handshaking between UDP sender receiver
ndash each UDP segment handled independently of others
Why is there a UDPbull no connection
establishment (which can add delay)
bull simple no connection state at sender receiver
bull small segment headerbull no congestion control
UDP can blast away as fast as desired
UDP more
bull Often used for streaming (videoaudio) or game appsndash loss tolerantndash rate sensitive
bull other UDP usesndash DNSndash SNMP
bull reliable transfer over UDP add reliability at application layerndash application-specific
error recovery
source port dest port
32 bits
Applicationdata (message)
UDP segment format
length checksumLength in
bytes of UDPsegmentincluding
header
UDP checksum
Senderbull treat segment contents
as sequence of 16-bit integers
bull checksum addition (1rsquos complement sum) of segment contents
bull sender puts checksum value into UDP checksum field
Receiverbull compute checksum of
received segmentbull check if computed
checksum equals checksum field valuendash NO - error detectedndash YES - no error detected
But maybe errors nonetheless More later hellip
Goal detect ldquoerrorsrdquo (eg flipped bits) in transmitted segment
Internet Checksum Example
bull Example add two 16-bit integers
1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1
wraparound
sum
checksum
bull At receiver add 2 integers and checksum hellip should be all 1rsquos If not bit error (correction next)
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Principles of Reliable data transferbull important in app transport link layers
bull top-10 list of important networking topics
bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
Principles of Reliable data transferbull important in app transport link layers
bull top-10 list of important networking topics
bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
Principles of Reliable Data Transferbull important in app transport link layersbull top-10 list of important networking topics
bull Characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
(Zoom
next
slid
e)
Reliable Data Transfer Getting Started
sendside
receiveside
rdt_send() called from above (eg by app) Passed data to deliver to receiver upper layer
udt_send() called by rdtto transfer packet over unreliable channel to receiver
rdt_rcv() called when packet arrives on rcv-side of channel
deliver_data() called by rdt to deliver data to upper
Reliable Data Transfer Getting Started
Wersquollbull Incrementally develop sender receiver
sides of reliable data transfer protocol (rdt)bull Consider only unidirectional data transfer
ndash but control info will flow on both directions
bull Use finite state machines (FSM) to specify sender receiver
state 1 state 2
event causing state transitionactions taken on state transition
state when in this ldquostaterdquo next state uniquely determined by next
event
eventactions
Rdt10 Reliable Transfer over a Reliable Channel
bull Underlying channel perfectly reliablendash no bit errorsndash no loss of packets
bull Separate FSMs for sender receiverndash sender sends data into underlying channelndash receiver read data from underlying channel
Wait for call from above packet = make_pkt(data)
udt_send(packet)
rdt_send(data)
extract (packetdata)deliver_data(data)
Wait for call from
below
rdt_rcv(packet)
sender receiverEasy
What if Taking a Message over Phone
bull Message is clearbull Message is garbled
What if Taking a Message over Phone
bull Message is clearndash Ok
bull Message is garbledndash Ask to repeatndash May not need whole message
bull In networks called Automatic Repeat reQuest (ARQ)ndash Need error detectionndash Receiver feedbackndash Retransmission
Rdt20 Channel with Bit Errors (no Loss)
bull Underlying channel may flip bits in packetndash Checksum to detect bit errors
bull The question how to recover from errorsndash acknowledgements (ACKs) receiver explicitly tells
sender that pkt received OKndash negative acknowledgements (NAKs) receiver
explicitly tells sender that pkt had errorsndash Sender retransmits pkt on receipt of NAK
bull New mechanisms in rdt20 (beyond rdt10)ndash Error detectionndash Receiver feedback control msgs (ACKNAK)
rcvrsender
Rdt20 FSM Specification
Wait for call from above
sndpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from below
sender
receiverrdt_send(data)
L
Rdt20 Operation with No Errors
Wait for call from above
snkpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from
below
rdt_send(data)
L
sender
receiver
Rdt20 Error Scenario
Wait for call from above
snkpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from
below
rdt_send(data)
L
sender
receiver
Sender sends one packet then waits for receiver response
stop and wait
Rdt20 Has a Fatal Flaw
Wait for call from above
sndpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from below
sender
receiverrdt_send(data)
L
Rdt20 Has a Fatal Flaw
What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicatebull How to handle duplicates
Rdt20 Has a Fatal Flaw
What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicate
Handling duplicates bull Sender retransmits
current pkt if ACKNAK garbled
bull Sender adds sequence number to each pktndash Can use 1 bit (for now)
bull receiver discards (doesnrsquot deliver up) duplicate pkt
Rdt21 Sender Handles Garbled ACKNAKs
Wait for call 0 from
above
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)
rdt_send(data)
Wait for ACK or NAK 0
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) ||isNAK(rcvpkt) )
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)
rdt_send(data)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)
Wait for call 1 from
above
Wait for ACK or NAK 1
LL
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)
Rdt21 Receiver Handles Garbled ACKNAKs
Wait for 0 from below
sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq0(rcvpkt)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
ampamp has_seq1(rcvpkt) extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
Wait for 1 from below
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq0(rcvpkt)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq1(rcvpkt)
sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
Rdt21 Discussion
Senderbull seq added to pktbull two seq rsquos (01) will sufficebull must check if received ACKNAK corrupted bull twice as many states
ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq
Receiverbull must check if received packet is duplicate
ndash state indicates whether 0 or 1 is expected pkt seq
bull note receiver can not know if its last ACKNAK received OK at sender
Rdt22 a NAK-free Protocol
bull Reduce type of response ACK onlybull Same functionality as rdt21 using ACKs onlybull Instead of NAK receiver sends ACK for last
pkt received OKndash receiver must explicitly include seq of pkt being
ACKed
bull Duplicate ACK at sender results in same action as NAK retransmit current pkt
rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))
udt_send(sndpkt)
Rdt22 Sender amp Receiver Fragments
Wait for call 0 from
above
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)
rdt_send(data)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
Wait for ACK 0
sender FSMfragment
Wait for 0 from below
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)
receiver FSMfragment
L
Rdt30 Channels with Errors and Loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of
help but not enough
How to determine if a packet is lost
Rdt30 Channels with Errors and Loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of
help but not enough
Approach sender waits ldquoreasonablerdquo amount of time for ACK
bull Retransmits if no ACK received in this time
bull If pkt (or ACK) just delayed (not lost)ndash Retransmission will be
duplicate but use of seq rsquos already handles this
ndash Receiver must specify seq of pkt being ACKed
bull Requires countdown timer
Rdt30 Sender
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer
rdt_send(data)
Wait for
ACK0
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )
Wait for call 1 from
above
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer
rdt_send(data)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)
stop_timerstop_timer
udt_send(sndpkt)start_timer
timeout
udt_send(sndpkt)start_timer
timeout
rdt_rcv(rcvpkt)
Wait for call 0from
above
Wait for
ACK1
Lrdt_rcv(rcvpkt)
LL
L
Rdt30 in Action
Rdt30 in Action
Performance of Rdt30
bull Rdt30 works but performance stinkshellipbull ex 1 Gbps link 15 ms prop delay 8000 bit
packet
U sender utilization ndash fraction of time sender busy sending
U sender
= 008
30008 = 000027
microseconds
L R
RTT + L R =
bull 1KB pkt every 30 msec -gt 33kBsec throughput over 1 Gbps link
bull Network protocol limits use of physical resources
dsmicrosecon8bps10
bits80009
R
Ldtrans
(Picture next slide)
Rdt30 Stop-and-Wait Operation
first packet bit transmitted t = 0
sender
receiver
RTT
last packet bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
U sender
= 008
30008 = 000027
microseconds
L R
RTT + L R =
Pipelined Protocols
Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash Range of sequence numbers must be increasedndash Need buffering at sender andor receiver
Pipelining Increased Utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
U sender
= 024
30008 = 00008
microseconds
3 L R
RTT + L R =
Increase utilizationby a factor of 3
bull Two generic forms of pipelined protocols go-Back-N selective repeat
Pipelining Protocols
Go-back-N overviewbull sender up to N
unACKed pkts in pipeline
bull receiver only sends cumulative ACKsndash doesnrsquot ACK pkt if
therersquos a gap
bull sender has timer for oldest unACKed pktndash if timer expires
retransmit all unACKed packets
Selective Repeat overviewbull sender up to N
unACKed packets in pipeline
bull receiver ACKs individual pkts
bull sender maintains timer for each unACKed pktndash if timer expires
retransmit only unACKed packet
Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unACKed pkts allowed
bull ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)
bull Timer for each in-flight pktbull Timeout(n) retransmit pkt n and all higher seq pkts in
window
GBN Sender Extended FSM
Waitstart_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])
timeout
rdt_send(data)
if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)
base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
base=1nextseqnum=1
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
L
GBN Receiver Extended FSM
ACK-only always send ACK for correctly-received pkt with highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum
bull out-of-order pkt ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK pkt with highest in-order seq
Wait
udt_send(sndpkt)
default
rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++
expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)
L
GBN inaction
GBN Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsgo-back-nindexhtml
Selective Repeatbull Receiver individually acknowledges all
correctly received pktsndash Buffers pkts as needed for eventual in-order
delivery to upper layer
bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt
bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts
Selective Repeat sender receiver windows
Selective Repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next
unACKed seq
senderpkt n in [rcvbase rcvbase+N-
1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also
deliver buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1]
bull ACK(n)
otherwise bull ignore
receiver
Transport Layer 3-52
Selective Repeat in Action
Selective Repeat
DilemmaExample bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
SR Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Overview RFCs 793 1122 1323 2018 2581
bull full duplex datandash bi-directional data flow
in same connectionndash MSS maximum
segment size
bull connection-oriented ndash handshaking
(exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not
overwhelm receiver
bull point-to-pointndash one sender one
receiver
bull reliable in-order byte steamndash no ldquomessage
boundariesrdquo
bull pipelinedndash TCP congestion and
flow control set window size
bull send amp receive buffers
socketdoor
T C Psend buffer
T C Preceive buffer
socketdoor
segm ent
applicationwrites data
applicationreads data
TCP Segment Structure
source port dest port
32 bits
applicationdata (variable length)
sequence number
acknowledgement number
Receive window
Urg data pointerchecksumFSRPAU
headlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
TCP Seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next
byte expected from other side
ndash cumulative ACKQ how receiver
handles out-of-order segmentsndash A TCP spec
doesnrsquot say up to implementer
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
UsertypeslsquoCrsquo
host ACKsreceipt of echoedlsquoCrsquo
timesimple telnet scenario
host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTT
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTTbull SampleRTT measured time
from segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
bull Exponential weighted moving averagebull influence of past sample decreases exponentially
fastbull typical value = 18th (or 0125)
Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety margin
bull First estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP reliable data transfer
bull TCP creates rdt service on top of IPrsquos unreliable service
bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs
bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control
TCP Sender EventsData rcvd from appbull Create segment with
seq bull seq is byte-stream
number of first data byte in segment
bull Start timer if not already running (think of timer as for oldest unACKed segment)
bull Expiration interval TimeOutInterval
Timeoutbull retransmit segment
that caused timeoutbull restart timer ACK rcvdbull If acknowledges
previously unACKed segmentsndash update what is known
to be ACKedndash start timer if there are
outstanding segments
TCP Sender
(simplified)
NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71
y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed
TCP Retransmission Scenarios
Host A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq
=92
tim
eout
ACK=120
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq
=92
tim
eout
SendBase= 100
SendBase= 120
SendBase= 120
Sendbase= 100
TCP Retransmission Scenarios (more)
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
Cumulative ACK scenario
Host B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
Fast Retransmit
bull Time-out period often relatively longndash Long delay before
resending lost packet
bull Detect lost segments via duplicate ACKsndash Sender often sends
many segments back-to-back
ndash If segment lost there will likely be many duplicate ACKs for that segment
bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
Host A
timeo
ut
Host B
time
X
resend seq X2
seq x1seq x2seq x3seq x4seq x5
ACK x1
ACK x1ACK x1ACK x1
tripleduplicate
ACKs
Fast
Retra
nsm
it
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Flow Control
bull Receive side of TCP connection has a receive buffer
bull speed-matching service matching send rate to receiving applicationrsquos drain rate
bull App process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast
flow control
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
TCP Flow Control How it Works
(suppose TCP receiver discards out-of-order segments)
bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Receiver advertises unused buffer space by including rwnd value in segment header
bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos
buffer doesnrsquot overflow
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
rwndRcvBuffer
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control
info (eg RcvWindow)bull client connection
initiator Socket clientSocket = new
Socket(ldquohostnamerdquo port) bull server contacted by
client Socket connectionSocket =
welcomeSocketaccept()
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial
seq Step 3 client receives
SYNACK replies with ACK segment which may contain data
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer 3-81
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Principles of Congestion Control
Congestionbull Informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull Different from flow controlbull Manifestations
ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)
bull A ldquotop-10rdquo problem
Causescosts of Congestion Scenario 1
bull Two senders two receivers
bull One router infinite buffers
bull No retransmission
bull Large delays when congested
bull Maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
Causescosts of Congestion Scenario 2
bull One router finite buffers bull Sender retransmission of lost packet
finite shared output link
buffers
Host A
lin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of congestion Scenario 2
bull Always (goodput)
bull ldquoPerfectrdquo retransmission only when loss
bull Retransmission of delayed (not lost) packet makes
larger (than perfect case) for same
lin
lout
=
lin
lout
gtl
inl
out
ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of
pkt
R2
R2lin
l ou
t
b
R2
R2lin
l ou
t
a
R2
R2lin
l ou
t
c
R4
R3
Causescosts of Congestion Scenario 3
bull Four sendersbull Multihop pathsbull Timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of Congestion Scenario 3
Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Host A
Host B
lo
u
t
Approaches towards congestion control
End-end congestion control
bull No explicit feedback from network
bull Congestion inferred from end-system observed loss delay
bull Approach taken by TCP
Network-assisted congestion control
bull Routers provide feedback to end systemsndash Single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash Explicit rate sender should send at
Broadly
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Congestion Control
bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level
bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)
network not congested so increase sending rate
- lost segment assume loss due to congested network so decrease sending rate
TCP Congestion Control Bandwidth Probing
bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since
available bandwidth is changing depending on other connections in network)
ACKs being received so increase rate
X
X
XX
X loss so decrease rate
send
ing
rate
time
bull Q how fast to increasedecrease- details to follow
TCPrsquosldquosawtoothrdquobehavior
TCP Congestion Control detailsbull sender limits rate by limiting
number of unACKed bytes ldquoin pipelinerdquo
ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)
bull roughly
bull cwnd is dynamic function of perceived network congestion
rate = cwnd
RTT bytessec
LastByteSent-LastByteAcked cwnd
cwndbytes
RTT
ACK(s)
TCP Congestion Control more details
segment loss event reducing cwnd
bull timeout no response from receiverndash cut cwnd to 1
bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less
aggressively than on timeout
ACK received increase cwnd
bull slowstart phase - increase exponentially
fast (despite name) at connection start or following timeout
bull congestion avoidance - increase linearly
TCP Slow Start
bull when connection begins cwnd = 1 MSSndash example MSS = 500
bytes amp RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp
up to respectable ratebull increase rate exponentially
until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transitioning intoout of slowstart
ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2
ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to
congestion avoidance phase
slow start timeout
ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed
new ACKdupACKcount++
duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion
avoidance
TCP Congestion Avoidance
bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1
MSS per RTT ndash approach possible
congestion slower than in slowstart
ndash implementation cwnd = cwnd + MSScwnd for each ACK received
bull ACKs increase cwnd by 1 MSS per RTT additive increase
bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease
AIMD
AIMD Additive IncreaseMultiplicative Decrease
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
Transport vs Network layer
bull network layer logical communication between hosts
bull transport layer logical communication between processes ndash relies on enhances network layer services
Household analogy12 kids sending letters to
12 kidsbull processes = kidsbull app messages = letters
in envelopesbull hosts = housesbull transport protocol = Ann
and Bill (collect mail from siblings)
bull network-layer protocol = postal service
Internet Transport-layer Protocols
bull reliable in-order delivery (TCP)ndash congestion control ndash flow controlndash connection setup
bull unreliable unordered delivery UDPndash no-frills extension of
ldquobest-effortrdquo IP
bull services not available ndash delay guaranteesndash bandwidth guarantees
applicationtransportnetworkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysical
applicationtransportnetworkdata linkphysical
logical end-end transport
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
UDP User Datagram Protocol [RFC 768]
bull ldquono frillsrdquo ldquobare bonesrdquo Internet transport protocol
bull ldquobest effortrdquo service UDP segments may bendash lostndash delivered out of order
to appbull connectionless
ndash no handshaking between UDP sender receiver
ndash each UDP segment handled independently of others
Why is there a UDPbull no connection
establishment (which can add delay)
bull simple no connection state at sender receiver
bull small segment headerbull no congestion control
UDP can blast away as fast as desired
UDP more
bull Often used for streaming (videoaudio) or game appsndash loss tolerantndash rate sensitive
bull other UDP usesndash DNSndash SNMP
bull reliable transfer over UDP add reliability at application layerndash application-specific
error recovery
source port dest port
32 bits
Applicationdata (message)
UDP segment format
length checksumLength in
bytes of UDPsegmentincluding
header
UDP checksum
Senderbull treat segment contents
as sequence of 16-bit integers
bull checksum addition (1rsquos complement sum) of segment contents
bull sender puts checksum value into UDP checksum field
Receiverbull compute checksum of
received segmentbull check if computed
checksum equals checksum field valuendash NO - error detectedndash YES - no error detected
But maybe errors nonetheless More later hellip
Goal detect ldquoerrorsrdquo (eg flipped bits) in transmitted segment
Internet Checksum Example
bull Example add two 16-bit integers
1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1
wraparound
sum
checksum
bull At receiver add 2 integers and checksum hellip should be all 1rsquos If not bit error (correction next)
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Principles of Reliable data transferbull important in app transport link layers
bull top-10 list of important networking topics
bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
Principles of Reliable data transferbull important in app transport link layers
bull top-10 list of important networking topics
bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
Principles of Reliable Data Transferbull important in app transport link layersbull top-10 list of important networking topics
bull Characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
(Zoom
next
slid
e)
Reliable Data Transfer Getting Started
sendside
receiveside
rdt_send() called from above (eg by app) Passed data to deliver to receiver upper layer
udt_send() called by rdtto transfer packet over unreliable channel to receiver
rdt_rcv() called when packet arrives on rcv-side of channel
deliver_data() called by rdt to deliver data to upper
Reliable Data Transfer Getting Started
Wersquollbull Incrementally develop sender receiver
sides of reliable data transfer protocol (rdt)bull Consider only unidirectional data transfer
ndash but control info will flow on both directions
bull Use finite state machines (FSM) to specify sender receiver
state 1 state 2
event causing state transitionactions taken on state transition
state when in this ldquostaterdquo next state uniquely determined by next
event
eventactions
Rdt10 Reliable Transfer over a Reliable Channel
bull Underlying channel perfectly reliablendash no bit errorsndash no loss of packets
bull Separate FSMs for sender receiverndash sender sends data into underlying channelndash receiver read data from underlying channel
Wait for call from above packet = make_pkt(data)
udt_send(packet)
rdt_send(data)
extract (packetdata)deliver_data(data)
Wait for call from
below
rdt_rcv(packet)
sender receiverEasy
What if Taking a Message over Phone
bull Message is clearbull Message is garbled
What if Taking a Message over Phone
bull Message is clearndash Ok
bull Message is garbledndash Ask to repeatndash May not need whole message
bull In networks called Automatic Repeat reQuest (ARQ)ndash Need error detectionndash Receiver feedbackndash Retransmission
Rdt20 Channel with Bit Errors (no Loss)
bull Underlying channel may flip bits in packetndash Checksum to detect bit errors
bull The question how to recover from errorsndash acknowledgements (ACKs) receiver explicitly tells
sender that pkt received OKndash negative acknowledgements (NAKs) receiver
explicitly tells sender that pkt had errorsndash Sender retransmits pkt on receipt of NAK
bull New mechanisms in rdt20 (beyond rdt10)ndash Error detectionndash Receiver feedback control msgs (ACKNAK)
rcvrsender
Rdt20 FSM Specification
Wait for call from above
sndpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from below
sender
receiverrdt_send(data)
L
Rdt20 Operation with No Errors
Wait for call from above
snkpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from
below
rdt_send(data)
L
sender
receiver
Rdt20 Error Scenario
Wait for call from above
snkpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from
below
rdt_send(data)
L
sender
receiver
Sender sends one packet then waits for receiver response
stop and wait
Rdt20 Has a Fatal Flaw
Wait for call from above
sndpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from below
sender
receiverrdt_send(data)
L
Rdt20 Has a Fatal Flaw
What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicatebull How to handle duplicates
Rdt20 Has a Fatal Flaw
What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicate
Handling duplicates bull Sender retransmits
current pkt if ACKNAK garbled
bull Sender adds sequence number to each pktndash Can use 1 bit (for now)
bull receiver discards (doesnrsquot deliver up) duplicate pkt
Rdt21 Sender Handles Garbled ACKNAKs
Wait for call 0 from
above
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)
rdt_send(data)
Wait for ACK or NAK 0
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) ||isNAK(rcvpkt) )
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)
rdt_send(data)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)
Wait for call 1 from
above
Wait for ACK or NAK 1
LL
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)
Rdt21 Receiver Handles Garbled ACKNAKs
Wait for 0 from below
sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq0(rcvpkt)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
ampamp has_seq1(rcvpkt) extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
Wait for 1 from below
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq0(rcvpkt)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq1(rcvpkt)
sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
Rdt21 Discussion
Senderbull seq added to pktbull two seq rsquos (01) will sufficebull must check if received ACKNAK corrupted bull twice as many states
ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq
Receiverbull must check if received packet is duplicate
ndash state indicates whether 0 or 1 is expected pkt seq
bull note receiver can not know if its last ACKNAK received OK at sender
Rdt22 a NAK-free Protocol
bull Reduce type of response ACK onlybull Same functionality as rdt21 using ACKs onlybull Instead of NAK receiver sends ACK for last
pkt received OKndash receiver must explicitly include seq of pkt being
ACKed
bull Duplicate ACK at sender results in same action as NAK retransmit current pkt
rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))
udt_send(sndpkt)
Rdt22 Sender amp Receiver Fragments
Wait for call 0 from
above
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)
rdt_send(data)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
Wait for ACK 0
sender FSMfragment
Wait for 0 from below
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)
receiver FSMfragment
L
Rdt30 Channels with Errors and Loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of
help but not enough
How to determine if a packet is lost
Rdt30 Channels with Errors and Loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of
help but not enough
Approach sender waits ldquoreasonablerdquo amount of time for ACK
bull Retransmits if no ACK received in this time
bull If pkt (or ACK) just delayed (not lost)ndash Retransmission will be
duplicate but use of seq rsquos already handles this
ndash Receiver must specify seq of pkt being ACKed
bull Requires countdown timer
Rdt30 Sender
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer
rdt_send(data)
Wait for
ACK0
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )
Wait for call 1 from
above
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer
rdt_send(data)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)
stop_timerstop_timer
udt_send(sndpkt)start_timer
timeout
udt_send(sndpkt)start_timer
timeout
rdt_rcv(rcvpkt)
Wait for call 0from
above
Wait for
ACK1
Lrdt_rcv(rcvpkt)
LL
L
Rdt30 in Action
Rdt30 in Action
Performance of Rdt30
bull Rdt30 works but performance stinkshellipbull ex 1 Gbps link 15 ms prop delay 8000 bit
packet
U sender utilization ndash fraction of time sender busy sending
U sender
= 008
30008 = 000027
microseconds
L R
RTT + L R =
bull 1KB pkt every 30 msec -gt 33kBsec throughput over 1 Gbps link
bull Network protocol limits use of physical resources
dsmicrosecon8bps10
bits80009
R
Ldtrans
(Picture next slide)
Rdt30 Stop-and-Wait Operation
first packet bit transmitted t = 0
sender
receiver
RTT
last packet bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
U sender
= 008
30008 = 000027
microseconds
L R
RTT + L R =
Pipelined Protocols
Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash Range of sequence numbers must be increasedndash Need buffering at sender andor receiver
Pipelining Increased Utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
U sender
= 024
30008 = 00008
microseconds
3 L R
RTT + L R =
Increase utilizationby a factor of 3
bull Two generic forms of pipelined protocols go-Back-N selective repeat
Pipelining Protocols
Go-back-N overviewbull sender up to N
unACKed pkts in pipeline
bull receiver only sends cumulative ACKsndash doesnrsquot ACK pkt if
therersquos a gap
bull sender has timer for oldest unACKed pktndash if timer expires
retransmit all unACKed packets
Selective Repeat overviewbull sender up to N
unACKed packets in pipeline
bull receiver ACKs individual pkts
bull sender maintains timer for each unACKed pktndash if timer expires
retransmit only unACKed packet
Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unACKed pkts allowed
bull ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)
bull Timer for each in-flight pktbull Timeout(n) retransmit pkt n and all higher seq pkts in
window
GBN Sender Extended FSM
Waitstart_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])
timeout
rdt_send(data)
if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)
base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
base=1nextseqnum=1
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
L
GBN Receiver Extended FSM
ACK-only always send ACK for correctly-received pkt with highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum
bull out-of-order pkt ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK pkt with highest in-order seq
Wait
udt_send(sndpkt)
default
rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++
expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)
L
GBN inaction
GBN Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsgo-back-nindexhtml
Selective Repeatbull Receiver individually acknowledges all
correctly received pktsndash Buffers pkts as needed for eventual in-order
delivery to upper layer
bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt
bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts
Selective Repeat sender receiver windows
Selective Repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next
unACKed seq
senderpkt n in [rcvbase rcvbase+N-
1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also
deliver buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1]
bull ACK(n)
otherwise bull ignore
receiver
Transport Layer 3-52
Selective Repeat in Action
Selective Repeat
DilemmaExample bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
SR Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Overview RFCs 793 1122 1323 2018 2581
bull full duplex datandash bi-directional data flow
in same connectionndash MSS maximum
segment size
bull connection-oriented ndash handshaking
(exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not
overwhelm receiver
bull point-to-pointndash one sender one
receiver
bull reliable in-order byte steamndash no ldquomessage
boundariesrdquo
bull pipelinedndash TCP congestion and
flow control set window size
bull send amp receive buffers
socketdoor
T C Psend buffer
T C Preceive buffer
socketdoor
segm ent
applicationwrites data
applicationreads data
TCP Segment Structure
source port dest port
32 bits
applicationdata (variable length)
sequence number
acknowledgement number
Receive window
Urg data pointerchecksumFSRPAU
headlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
TCP Seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next
byte expected from other side
ndash cumulative ACKQ how receiver
handles out-of-order segmentsndash A TCP spec
doesnrsquot say up to implementer
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
UsertypeslsquoCrsquo
host ACKsreceipt of echoedlsquoCrsquo
timesimple telnet scenario
host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTT
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTTbull SampleRTT measured time
from segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
bull Exponential weighted moving averagebull influence of past sample decreases exponentially
fastbull typical value = 18th (or 0125)
Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety margin
bull First estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP reliable data transfer
bull TCP creates rdt service on top of IPrsquos unreliable service
bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs
bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control
TCP Sender EventsData rcvd from appbull Create segment with
seq bull seq is byte-stream
number of first data byte in segment
bull Start timer if not already running (think of timer as for oldest unACKed segment)
bull Expiration interval TimeOutInterval
Timeoutbull retransmit segment
that caused timeoutbull restart timer ACK rcvdbull If acknowledges
previously unACKed segmentsndash update what is known
to be ACKedndash start timer if there are
outstanding segments
TCP Sender
(simplified)
NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71
y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed
TCP Retransmission Scenarios
Host A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq
=92
tim
eout
ACK=120
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq
=92
tim
eout
SendBase= 100
SendBase= 120
SendBase= 120
Sendbase= 100
TCP Retransmission Scenarios (more)
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
Cumulative ACK scenario
Host B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
Fast Retransmit
bull Time-out period often relatively longndash Long delay before
resending lost packet
bull Detect lost segments via duplicate ACKsndash Sender often sends
many segments back-to-back
ndash If segment lost there will likely be many duplicate ACKs for that segment
bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
Host A
timeo
ut
Host B
time
X
resend seq X2
seq x1seq x2seq x3seq x4seq x5
ACK x1
ACK x1ACK x1ACK x1
tripleduplicate
ACKs
Fast
Retra
nsm
it
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Flow Control
bull Receive side of TCP connection has a receive buffer
bull speed-matching service matching send rate to receiving applicationrsquos drain rate
bull App process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast
flow control
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
TCP Flow Control How it Works
(suppose TCP receiver discards out-of-order segments)
bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Receiver advertises unused buffer space by including rwnd value in segment header
bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos
buffer doesnrsquot overflow
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
rwndRcvBuffer
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control
info (eg RcvWindow)bull client connection
initiator Socket clientSocket = new
Socket(ldquohostnamerdquo port) bull server contacted by
client Socket connectionSocket =
welcomeSocketaccept()
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial
seq Step 3 client receives
SYNACK replies with ACK segment which may contain data
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer 3-81
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Principles of Congestion Control
Congestionbull Informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull Different from flow controlbull Manifestations
ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)
bull A ldquotop-10rdquo problem
Causescosts of Congestion Scenario 1
bull Two senders two receivers
bull One router infinite buffers
bull No retransmission
bull Large delays when congested
bull Maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
Causescosts of Congestion Scenario 2
bull One router finite buffers bull Sender retransmission of lost packet
finite shared output link
buffers
Host A
lin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of congestion Scenario 2
bull Always (goodput)
bull ldquoPerfectrdquo retransmission only when loss
bull Retransmission of delayed (not lost) packet makes
larger (than perfect case) for same
lin
lout
=
lin
lout
gtl
inl
out
ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of
pkt
R2
R2lin
l ou
t
b
R2
R2lin
l ou
t
a
R2
R2lin
l ou
t
c
R4
R3
Causescosts of Congestion Scenario 3
bull Four sendersbull Multihop pathsbull Timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of Congestion Scenario 3
Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Host A
Host B
lo
u
t
Approaches towards congestion control
End-end congestion control
bull No explicit feedback from network
bull Congestion inferred from end-system observed loss delay
bull Approach taken by TCP
Network-assisted congestion control
bull Routers provide feedback to end systemsndash Single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash Explicit rate sender should send at
Broadly
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Congestion Control
bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level
bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)
network not congested so increase sending rate
- lost segment assume loss due to congested network so decrease sending rate
TCP Congestion Control Bandwidth Probing
bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since
available bandwidth is changing depending on other connections in network)
ACKs being received so increase rate
X
X
XX
X loss so decrease rate
send
ing
rate
time
bull Q how fast to increasedecrease- details to follow
TCPrsquosldquosawtoothrdquobehavior
TCP Congestion Control detailsbull sender limits rate by limiting
number of unACKed bytes ldquoin pipelinerdquo
ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)
bull roughly
bull cwnd is dynamic function of perceived network congestion
rate = cwnd
RTT bytessec
LastByteSent-LastByteAcked cwnd
cwndbytes
RTT
ACK(s)
TCP Congestion Control more details
segment loss event reducing cwnd
bull timeout no response from receiverndash cut cwnd to 1
bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less
aggressively than on timeout
ACK received increase cwnd
bull slowstart phase - increase exponentially
fast (despite name) at connection start or following timeout
bull congestion avoidance - increase linearly
TCP Slow Start
bull when connection begins cwnd = 1 MSSndash example MSS = 500
bytes amp RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp
up to respectable ratebull increase rate exponentially
until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transitioning intoout of slowstart
ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2
ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to
congestion avoidance phase
slow start timeout
ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed
new ACKdupACKcount++
duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion
avoidance
TCP Congestion Avoidance
bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1
MSS per RTT ndash approach possible
congestion slower than in slowstart
ndash implementation cwnd = cwnd + MSScwnd for each ACK received
bull ACKs increase cwnd by 1 MSS per RTT additive increase
bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease
AIMD
AIMD Additive IncreaseMultiplicative Decrease
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
Internet Transport-layer Protocols
bull reliable in-order delivery (TCP)ndash congestion control ndash flow controlndash connection setup
bull unreliable unordered delivery UDPndash no-frills extension of
ldquobest-effortrdquo IP
bull services not available ndash delay guaranteesndash bandwidth guarantees
applicationtransportnetworkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysical
applicationtransportnetworkdata linkphysical
logical end-end transport
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
UDP User Datagram Protocol [RFC 768]
bull ldquono frillsrdquo ldquobare bonesrdquo Internet transport protocol
bull ldquobest effortrdquo service UDP segments may bendash lostndash delivered out of order
to appbull connectionless
ndash no handshaking between UDP sender receiver
ndash each UDP segment handled independently of others
Why is there a UDPbull no connection
establishment (which can add delay)
bull simple no connection state at sender receiver
bull small segment headerbull no congestion control
UDP can blast away as fast as desired
UDP more
bull Often used for streaming (videoaudio) or game appsndash loss tolerantndash rate sensitive
bull other UDP usesndash DNSndash SNMP
bull reliable transfer over UDP add reliability at application layerndash application-specific
error recovery
source port dest port
32 bits
Applicationdata (message)
UDP segment format
length checksumLength in
bytes of UDPsegmentincluding
header
UDP checksum
Senderbull treat segment contents
as sequence of 16-bit integers
bull checksum addition (1rsquos complement sum) of segment contents
bull sender puts checksum value into UDP checksum field
Receiverbull compute checksum of
received segmentbull check if computed
checksum equals checksum field valuendash NO - error detectedndash YES - no error detected
But maybe errors nonetheless More later hellip
Goal detect ldquoerrorsrdquo (eg flipped bits) in transmitted segment
Internet Checksum Example
bull Example add two 16-bit integers
1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1
wraparound
sum
checksum
bull At receiver add 2 integers and checksum hellip should be all 1rsquos If not bit error (correction next)
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Principles of Reliable data transferbull important in app transport link layers
bull top-10 list of important networking topics
bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
Principles of Reliable data transferbull important in app transport link layers
bull top-10 list of important networking topics
bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
Principles of Reliable Data Transferbull important in app transport link layersbull top-10 list of important networking topics
bull Characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
(Zoom
next
slid
e)
Reliable Data Transfer Getting Started
sendside
receiveside
rdt_send() called from above (eg by app) Passed data to deliver to receiver upper layer
udt_send() called by rdtto transfer packet over unreliable channel to receiver
rdt_rcv() called when packet arrives on rcv-side of channel
deliver_data() called by rdt to deliver data to upper
Reliable Data Transfer Getting Started
Wersquollbull Incrementally develop sender receiver
sides of reliable data transfer protocol (rdt)bull Consider only unidirectional data transfer
ndash but control info will flow on both directions
bull Use finite state machines (FSM) to specify sender receiver
state 1 state 2
event causing state transitionactions taken on state transition
state when in this ldquostaterdquo next state uniquely determined by next
event
eventactions
Rdt10 Reliable Transfer over a Reliable Channel
bull Underlying channel perfectly reliablendash no bit errorsndash no loss of packets
bull Separate FSMs for sender receiverndash sender sends data into underlying channelndash receiver read data from underlying channel
Wait for call from above packet = make_pkt(data)
udt_send(packet)
rdt_send(data)
extract (packetdata)deliver_data(data)
Wait for call from
below
rdt_rcv(packet)
sender receiverEasy
What if Taking a Message over Phone
bull Message is clearbull Message is garbled
What if Taking a Message over Phone
bull Message is clearndash Ok
bull Message is garbledndash Ask to repeatndash May not need whole message
bull In networks called Automatic Repeat reQuest (ARQ)ndash Need error detectionndash Receiver feedbackndash Retransmission
Rdt20 Channel with Bit Errors (no Loss)
bull Underlying channel may flip bits in packetndash Checksum to detect bit errors
bull The question how to recover from errorsndash acknowledgements (ACKs) receiver explicitly tells
sender that pkt received OKndash negative acknowledgements (NAKs) receiver
explicitly tells sender that pkt had errorsndash Sender retransmits pkt on receipt of NAK
bull New mechanisms in rdt20 (beyond rdt10)ndash Error detectionndash Receiver feedback control msgs (ACKNAK)
rcvrsender
Rdt20 FSM Specification
Wait for call from above
sndpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from below
sender
receiverrdt_send(data)
L
Rdt20 Operation with No Errors
Wait for call from above
snkpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from
below
rdt_send(data)
L
sender
receiver
Rdt20 Error Scenario
Wait for call from above
snkpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from
below
rdt_send(data)
L
sender
receiver
Sender sends one packet then waits for receiver response
stop and wait
Rdt20 Has a Fatal Flaw
Wait for call from above
sndpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from below
sender
receiverrdt_send(data)
L
Rdt20 Has a Fatal Flaw
What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicatebull How to handle duplicates
Rdt20 Has a Fatal Flaw
What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicate
Handling duplicates bull Sender retransmits
current pkt if ACKNAK garbled
bull Sender adds sequence number to each pktndash Can use 1 bit (for now)
bull receiver discards (doesnrsquot deliver up) duplicate pkt
Rdt21 Sender Handles Garbled ACKNAKs
Wait for call 0 from
above
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)
rdt_send(data)
Wait for ACK or NAK 0
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) ||isNAK(rcvpkt) )
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)
rdt_send(data)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)
Wait for call 1 from
above
Wait for ACK or NAK 1
LL
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)
Rdt21 Receiver Handles Garbled ACKNAKs
Wait for 0 from below
sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq0(rcvpkt)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
ampamp has_seq1(rcvpkt) extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
Wait for 1 from below
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq0(rcvpkt)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq1(rcvpkt)
sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
Rdt21 Discussion
Senderbull seq added to pktbull two seq rsquos (01) will sufficebull must check if received ACKNAK corrupted bull twice as many states
ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq
Receiverbull must check if received packet is duplicate
ndash state indicates whether 0 or 1 is expected pkt seq
bull note receiver can not know if its last ACKNAK received OK at sender
Rdt22 a NAK-free Protocol
bull Reduce type of response ACK onlybull Same functionality as rdt21 using ACKs onlybull Instead of NAK receiver sends ACK for last
pkt received OKndash receiver must explicitly include seq of pkt being
ACKed
bull Duplicate ACK at sender results in same action as NAK retransmit current pkt
rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))
udt_send(sndpkt)
Rdt22 Sender amp Receiver Fragments
Wait for call 0 from
above
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)
rdt_send(data)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
Wait for ACK 0
sender FSMfragment
Wait for 0 from below
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)
receiver FSMfragment
L
Rdt30 Channels with Errors and Loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of
help but not enough
How to determine if a packet is lost
Rdt30 Channels with Errors and Loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of
help but not enough
Approach sender waits ldquoreasonablerdquo amount of time for ACK
bull Retransmits if no ACK received in this time
bull If pkt (or ACK) just delayed (not lost)ndash Retransmission will be
duplicate but use of seq rsquos already handles this
ndash Receiver must specify seq of pkt being ACKed
bull Requires countdown timer
Rdt30 Sender
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer
rdt_send(data)
Wait for
ACK0
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )
Wait for call 1 from
above
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer
rdt_send(data)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)
stop_timerstop_timer
udt_send(sndpkt)start_timer
timeout
udt_send(sndpkt)start_timer
timeout
rdt_rcv(rcvpkt)
Wait for call 0from
above
Wait for
ACK1
Lrdt_rcv(rcvpkt)
LL
L
Rdt30 in Action
Rdt30 in Action
Performance of Rdt30
bull Rdt30 works but performance stinkshellipbull ex 1 Gbps link 15 ms prop delay 8000 bit
packet
U sender utilization ndash fraction of time sender busy sending
U sender
= 008
30008 = 000027
microseconds
L R
RTT + L R =
bull 1KB pkt every 30 msec -gt 33kBsec throughput over 1 Gbps link
bull Network protocol limits use of physical resources
dsmicrosecon8bps10
bits80009
R
Ldtrans
(Picture next slide)
Rdt30 Stop-and-Wait Operation
first packet bit transmitted t = 0
sender
receiver
RTT
last packet bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
U sender
= 008
30008 = 000027
microseconds
L R
RTT + L R =
Pipelined Protocols
Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash Range of sequence numbers must be increasedndash Need buffering at sender andor receiver
Pipelining Increased Utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
U sender
= 024
30008 = 00008
microseconds
3 L R
RTT + L R =
Increase utilizationby a factor of 3
bull Two generic forms of pipelined protocols go-Back-N selective repeat
Pipelining Protocols
Go-back-N overviewbull sender up to N
unACKed pkts in pipeline
bull receiver only sends cumulative ACKsndash doesnrsquot ACK pkt if
therersquos a gap
bull sender has timer for oldest unACKed pktndash if timer expires
retransmit all unACKed packets
Selective Repeat overviewbull sender up to N
unACKed packets in pipeline
bull receiver ACKs individual pkts
bull sender maintains timer for each unACKed pktndash if timer expires
retransmit only unACKed packet
Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unACKed pkts allowed
bull ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)
bull Timer for each in-flight pktbull Timeout(n) retransmit pkt n and all higher seq pkts in
window
GBN Sender Extended FSM
Waitstart_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])
timeout
rdt_send(data)
if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)
base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
base=1nextseqnum=1
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
L
GBN Receiver Extended FSM
ACK-only always send ACK for correctly-received pkt with highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum
bull out-of-order pkt ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK pkt with highest in-order seq
Wait
udt_send(sndpkt)
default
rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++
expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)
L
GBN inaction
GBN Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsgo-back-nindexhtml
Selective Repeatbull Receiver individually acknowledges all
correctly received pktsndash Buffers pkts as needed for eventual in-order
delivery to upper layer
bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt
bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts
Selective Repeat sender receiver windows
Selective Repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next
unACKed seq
senderpkt n in [rcvbase rcvbase+N-
1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also
deliver buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1]
bull ACK(n)
otherwise bull ignore
receiver
Transport Layer 3-52
Selective Repeat in Action
Selective Repeat
DilemmaExample bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
SR Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Overview RFCs 793 1122 1323 2018 2581
bull full duplex datandash bi-directional data flow
in same connectionndash MSS maximum
segment size
bull connection-oriented ndash handshaking
(exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not
overwhelm receiver
bull point-to-pointndash one sender one
receiver
bull reliable in-order byte steamndash no ldquomessage
boundariesrdquo
bull pipelinedndash TCP congestion and
flow control set window size
bull send amp receive buffers
socketdoor
T C Psend buffer
T C Preceive buffer
socketdoor
segm ent
applicationwrites data
applicationreads data
TCP Segment Structure
source port dest port
32 bits
applicationdata (variable length)
sequence number
acknowledgement number
Receive window
Urg data pointerchecksumFSRPAU
headlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
TCP Seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next
byte expected from other side
ndash cumulative ACKQ how receiver
handles out-of-order segmentsndash A TCP spec
doesnrsquot say up to implementer
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
UsertypeslsquoCrsquo
host ACKsreceipt of echoedlsquoCrsquo
timesimple telnet scenario
host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTT
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTTbull SampleRTT measured time
from segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
bull Exponential weighted moving averagebull influence of past sample decreases exponentially
fastbull typical value = 18th (or 0125)
Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety margin
bull First estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP reliable data transfer
bull TCP creates rdt service on top of IPrsquos unreliable service
bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs
bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control
TCP Sender EventsData rcvd from appbull Create segment with
seq bull seq is byte-stream
number of first data byte in segment
bull Start timer if not already running (think of timer as for oldest unACKed segment)
bull Expiration interval TimeOutInterval
Timeoutbull retransmit segment
that caused timeoutbull restart timer ACK rcvdbull If acknowledges
previously unACKed segmentsndash update what is known
to be ACKedndash start timer if there are
outstanding segments
TCP Sender
(simplified)
NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71
y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed
TCP Retransmission Scenarios
Host A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq
=92
tim
eout
ACK=120
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq
=92
tim
eout
SendBase= 100
SendBase= 120
SendBase= 120
Sendbase= 100
TCP Retransmission Scenarios (more)
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
Cumulative ACK scenario
Host B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
Fast Retransmit
bull Time-out period often relatively longndash Long delay before
resending lost packet
bull Detect lost segments via duplicate ACKsndash Sender often sends
many segments back-to-back
ndash If segment lost there will likely be many duplicate ACKs for that segment
bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
Host A
timeo
ut
Host B
time
X
resend seq X2
seq x1seq x2seq x3seq x4seq x5
ACK x1
ACK x1ACK x1ACK x1
tripleduplicate
ACKs
Fast
Retra
nsm
it
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Flow Control
bull Receive side of TCP connection has a receive buffer
bull speed-matching service matching send rate to receiving applicationrsquos drain rate
bull App process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast
flow control
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
TCP Flow Control How it Works
(suppose TCP receiver discards out-of-order segments)
bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Receiver advertises unused buffer space by including rwnd value in segment header
bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos
buffer doesnrsquot overflow
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
rwndRcvBuffer
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control
info (eg RcvWindow)bull client connection
initiator Socket clientSocket = new
Socket(ldquohostnamerdquo port) bull server contacted by
client Socket connectionSocket =
welcomeSocketaccept()
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial
seq Step 3 client receives
SYNACK replies with ACK segment which may contain data
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer 3-81
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Principles of Congestion Control
Congestionbull Informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull Different from flow controlbull Manifestations
ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)
bull A ldquotop-10rdquo problem
Causescosts of Congestion Scenario 1
bull Two senders two receivers
bull One router infinite buffers
bull No retransmission
bull Large delays when congested
bull Maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
Causescosts of Congestion Scenario 2
bull One router finite buffers bull Sender retransmission of lost packet
finite shared output link
buffers
Host A
lin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of congestion Scenario 2
bull Always (goodput)
bull ldquoPerfectrdquo retransmission only when loss
bull Retransmission of delayed (not lost) packet makes
larger (than perfect case) for same
lin
lout
=
lin
lout
gtl
inl
out
ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of
pkt
R2
R2lin
l ou
t
b
R2
R2lin
l ou
t
a
R2
R2lin
l ou
t
c
R4
R3
Causescosts of Congestion Scenario 3
bull Four sendersbull Multihop pathsbull Timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of Congestion Scenario 3
Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Host A
Host B
lo
u
t
Approaches towards congestion control
End-end congestion control
bull No explicit feedback from network
bull Congestion inferred from end-system observed loss delay
bull Approach taken by TCP
Network-assisted congestion control
bull Routers provide feedback to end systemsndash Single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash Explicit rate sender should send at
Broadly
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Congestion Control
bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level
bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)
network not congested so increase sending rate
- lost segment assume loss due to congested network so decrease sending rate
TCP Congestion Control Bandwidth Probing
bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since
available bandwidth is changing depending on other connections in network)
ACKs being received so increase rate
X
X
XX
X loss so decrease rate
send
ing
rate
time
bull Q how fast to increasedecrease- details to follow
TCPrsquosldquosawtoothrdquobehavior
TCP Congestion Control detailsbull sender limits rate by limiting
number of unACKed bytes ldquoin pipelinerdquo
ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)
bull roughly
bull cwnd is dynamic function of perceived network congestion
rate = cwnd
RTT bytessec
LastByteSent-LastByteAcked cwnd
cwndbytes
RTT
ACK(s)
TCP Congestion Control more details
segment loss event reducing cwnd
bull timeout no response from receiverndash cut cwnd to 1
bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less
aggressively than on timeout
ACK received increase cwnd
bull slowstart phase - increase exponentially
fast (despite name) at connection start or following timeout
bull congestion avoidance - increase linearly
TCP Slow Start
bull when connection begins cwnd = 1 MSSndash example MSS = 500
bytes amp RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp
up to respectable ratebull increase rate exponentially
until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transitioning intoout of slowstart
ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2
ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to
congestion avoidance phase
slow start timeout
ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed
new ACKdupACKcount++
duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion
avoidance
TCP Congestion Avoidance
bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1
MSS per RTT ndash approach possible
congestion slower than in slowstart
ndash implementation cwnd = cwnd + MSScwnd for each ACK received
bull ACKs increase cwnd by 1 MSS per RTT additive increase
bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease
AIMD
AIMD Additive IncreaseMultiplicative Decrease
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
UDP User Datagram Protocol [RFC 768]
bull ldquono frillsrdquo ldquobare bonesrdquo Internet transport protocol
bull ldquobest effortrdquo service UDP segments may bendash lostndash delivered out of order
to appbull connectionless
ndash no handshaking between UDP sender receiver
ndash each UDP segment handled independently of others
Why is there a UDPbull no connection
establishment (which can add delay)
bull simple no connection state at sender receiver
bull small segment headerbull no congestion control
UDP can blast away as fast as desired
UDP more
bull Often used for streaming (videoaudio) or game appsndash loss tolerantndash rate sensitive
bull other UDP usesndash DNSndash SNMP
bull reliable transfer over UDP add reliability at application layerndash application-specific
error recovery
source port dest port
32 bits
Applicationdata (message)
UDP segment format
length checksumLength in
bytes of UDPsegmentincluding
header
UDP checksum
Senderbull treat segment contents
as sequence of 16-bit integers
bull checksum addition (1rsquos complement sum) of segment contents
bull sender puts checksum value into UDP checksum field
Receiverbull compute checksum of
received segmentbull check if computed
checksum equals checksum field valuendash NO - error detectedndash YES - no error detected
But maybe errors nonetheless More later hellip
Goal detect ldquoerrorsrdquo (eg flipped bits) in transmitted segment
Internet Checksum Example
bull Example add two 16-bit integers
1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1
wraparound
sum
checksum
bull At receiver add 2 integers and checksum hellip should be all 1rsquos If not bit error (correction next)
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Principles of Reliable data transferbull important in app transport link layers
bull top-10 list of important networking topics
bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
Principles of Reliable data transferbull important in app transport link layers
bull top-10 list of important networking topics
bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
Principles of Reliable Data Transferbull important in app transport link layersbull top-10 list of important networking topics
bull Characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
(Zoom
next
slid
e)
Reliable Data Transfer Getting Started
sendside
receiveside
rdt_send() called from above (eg by app) Passed data to deliver to receiver upper layer
udt_send() called by rdtto transfer packet over unreliable channel to receiver
rdt_rcv() called when packet arrives on rcv-side of channel
deliver_data() called by rdt to deliver data to upper
Reliable Data Transfer Getting Started
Wersquollbull Incrementally develop sender receiver
sides of reliable data transfer protocol (rdt)bull Consider only unidirectional data transfer
ndash but control info will flow on both directions
bull Use finite state machines (FSM) to specify sender receiver
state 1 state 2
event causing state transitionactions taken on state transition
state when in this ldquostaterdquo next state uniquely determined by next
event
eventactions
Rdt10 Reliable Transfer over a Reliable Channel
bull Underlying channel perfectly reliablendash no bit errorsndash no loss of packets
bull Separate FSMs for sender receiverndash sender sends data into underlying channelndash receiver read data from underlying channel
Wait for call from above packet = make_pkt(data)
udt_send(packet)
rdt_send(data)
extract (packetdata)deliver_data(data)
Wait for call from
below
rdt_rcv(packet)
sender receiverEasy
What if Taking a Message over Phone
bull Message is clearbull Message is garbled
What if Taking a Message over Phone
bull Message is clearndash Ok
bull Message is garbledndash Ask to repeatndash May not need whole message
bull In networks called Automatic Repeat reQuest (ARQ)ndash Need error detectionndash Receiver feedbackndash Retransmission
Rdt20 Channel with Bit Errors (no Loss)
bull Underlying channel may flip bits in packetndash Checksum to detect bit errors
bull The question how to recover from errorsndash acknowledgements (ACKs) receiver explicitly tells
sender that pkt received OKndash negative acknowledgements (NAKs) receiver
explicitly tells sender that pkt had errorsndash Sender retransmits pkt on receipt of NAK
bull New mechanisms in rdt20 (beyond rdt10)ndash Error detectionndash Receiver feedback control msgs (ACKNAK)
rcvrsender
Rdt20 FSM Specification
Wait for call from above
sndpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from below
sender
receiverrdt_send(data)
L
Rdt20 Operation with No Errors
Wait for call from above
snkpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from
below
rdt_send(data)
L
sender
receiver
Rdt20 Error Scenario
Wait for call from above
snkpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from
below
rdt_send(data)
L
sender
receiver
Sender sends one packet then waits for receiver response
stop and wait
Rdt20 Has a Fatal Flaw
Wait for call from above
sndpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from below
sender
receiverrdt_send(data)
L
Rdt20 Has a Fatal Flaw
What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicatebull How to handle duplicates
Rdt20 Has a Fatal Flaw
What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicate
Handling duplicates bull Sender retransmits
current pkt if ACKNAK garbled
bull Sender adds sequence number to each pktndash Can use 1 bit (for now)
bull receiver discards (doesnrsquot deliver up) duplicate pkt
Rdt21 Sender Handles Garbled ACKNAKs
Wait for call 0 from
above
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)
rdt_send(data)
Wait for ACK or NAK 0
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) ||isNAK(rcvpkt) )
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)
rdt_send(data)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)
Wait for call 1 from
above
Wait for ACK or NAK 1
LL
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)
Rdt21 Receiver Handles Garbled ACKNAKs
Wait for 0 from below
sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq0(rcvpkt)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
ampamp has_seq1(rcvpkt) extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
Wait for 1 from below
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq0(rcvpkt)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq1(rcvpkt)
sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
Rdt21 Discussion
Senderbull seq added to pktbull two seq rsquos (01) will sufficebull must check if received ACKNAK corrupted bull twice as many states
ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq
Receiverbull must check if received packet is duplicate
ndash state indicates whether 0 or 1 is expected pkt seq
bull note receiver can not know if its last ACKNAK received OK at sender
Rdt22 a NAK-free Protocol
bull Reduce type of response ACK onlybull Same functionality as rdt21 using ACKs onlybull Instead of NAK receiver sends ACK for last
pkt received OKndash receiver must explicitly include seq of pkt being
ACKed
bull Duplicate ACK at sender results in same action as NAK retransmit current pkt
rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))
udt_send(sndpkt)
Rdt22 Sender amp Receiver Fragments
Wait for call 0 from
above
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)
rdt_send(data)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
Wait for ACK 0
sender FSMfragment
Wait for 0 from below
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)
receiver FSMfragment
L
Rdt30 Channels with Errors and Loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of
help but not enough
How to determine if a packet is lost
Rdt30 Channels with Errors and Loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of
help but not enough
Approach sender waits ldquoreasonablerdquo amount of time for ACK
bull Retransmits if no ACK received in this time
bull If pkt (or ACK) just delayed (not lost)ndash Retransmission will be
duplicate but use of seq rsquos already handles this
ndash Receiver must specify seq of pkt being ACKed
bull Requires countdown timer
Rdt30 Sender
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer
rdt_send(data)
Wait for
ACK0
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )
Wait for call 1 from
above
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer
rdt_send(data)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)
stop_timerstop_timer
udt_send(sndpkt)start_timer
timeout
udt_send(sndpkt)start_timer
timeout
rdt_rcv(rcvpkt)
Wait for call 0from
above
Wait for
ACK1
Lrdt_rcv(rcvpkt)
LL
L
Rdt30 in Action
Rdt30 in Action
Performance of Rdt30
bull Rdt30 works but performance stinkshellipbull ex 1 Gbps link 15 ms prop delay 8000 bit
packet
U sender utilization ndash fraction of time sender busy sending
U sender
= 008
30008 = 000027
microseconds
L R
RTT + L R =
bull 1KB pkt every 30 msec -gt 33kBsec throughput over 1 Gbps link
bull Network protocol limits use of physical resources
dsmicrosecon8bps10
bits80009
R
Ldtrans
(Picture next slide)
Rdt30 Stop-and-Wait Operation
first packet bit transmitted t = 0
sender
receiver
RTT
last packet bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
U sender
= 008
30008 = 000027
microseconds
L R
RTT + L R =
Pipelined Protocols
Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash Range of sequence numbers must be increasedndash Need buffering at sender andor receiver
Pipelining Increased Utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
U sender
= 024
30008 = 00008
microseconds
3 L R
RTT + L R =
Increase utilizationby a factor of 3
bull Two generic forms of pipelined protocols go-Back-N selective repeat
Pipelining Protocols
Go-back-N overviewbull sender up to N
unACKed pkts in pipeline
bull receiver only sends cumulative ACKsndash doesnrsquot ACK pkt if
therersquos a gap
bull sender has timer for oldest unACKed pktndash if timer expires
retransmit all unACKed packets
Selective Repeat overviewbull sender up to N
unACKed packets in pipeline
bull receiver ACKs individual pkts
bull sender maintains timer for each unACKed pktndash if timer expires
retransmit only unACKed packet
Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unACKed pkts allowed
bull ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)
bull Timer for each in-flight pktbull Timeout(n) retransmit pkt n and all higher seq pkts in
window
GBN Sender Extended FSM
Waitstart_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])
timeout
rdt_send(data)
if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)
base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
base=1nextseqnum=1
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
L
GBN Receiver Extended FSM
ACK-only always send ACK for correctly-received pkt with highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum
bull out-of-order pkt ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK pkt with highest in-order seq
Wait
udt_send(sndpkt)
default
rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++
expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)
L
GBN inaction
GBN Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsgo-back-nindexhtml
Selective Repeatbull Receiver individually acknowledges all
correctly received pktsndash Buffers pkts as needed for eventual in-order
delivery to upper layer
bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt
bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts
Selective Repeat sender receiver windows
Selective Repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next
unACKed seq
senderpkt n in [rcvbase rcvbase+N-
1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also
deliver buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1]
bull ACK(n)
otherwise bull ignore
receiver
Transport Layer 3-52
Selective Repeat in Action
Selective Repeat
DilemmaExample bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
SR Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Overview RFCs 793 1122 1323 2018 2581
bull full duplex datandash bi-directional data flow
in same connectionndash MSS maximum
segment size
bull connection-oriented ndash handshaking
(exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not
overwhelm receiver
bull point-to-pointndash one sender one
receiver
bull reliable in-order byte steamndash no ldquomessage
boundariesrdquo
bull pipelinedndash TCP congestion and
flow control set window size
bull send amp receive buffers
socketdoor
T C Psend buffer
T C Preceive buffer
socketdoor
segm ent
applicationwrites data
applicationreads data
TCP Segment Structure
source port dest port
32 bits
applicationdata (variable length)
sequence number
acknowledgement number
Receive window
Urg data pointerchecksumFSRPAU
headlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
TCP Seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next
byte expected from other side
ndash cumulative ACKQ how receiver
handles out-of-order segmentsndash A TCP spec
doesnrsquot say up to implementer
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
UsertypeslsquoCrsquo
host ACKsreceipt of echoedlsquoCrsquo
timesimple telnet scenario
host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTT
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTTbull SampleRTT measured time
from segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
bull Exponential weighted moving averagebull influence of past sample decreases exponentially
fastbull typical value = 18th (or 0125)
Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety margin
bull First estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP reliable data transfer
bull TCP creates rdt service on top of IPrsquos unreliable service
bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs
bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control
TCP Sender EventsData rcvd from appbull Create segment with
seq bull seq is byte-stream
number of first data byte in segment
bull Start timer if not already running (think of timer as for oldest unACKed segment)
bull Expiration interval TimeOutInterval
Timeoutbull retransmit segment
that caused timeoutbull restart timer ACK rcvdbull If acknowledges
previously unACKed segmentsndash update what is known
to be ACKedndash start timer if there are
outstanding segments
TCP Sender
(simplified)
NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71
y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed
TCP Retransmission Scenarios
Host A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq
=92
tim
eout
ACK=120
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq
=92
tim
eout
SendBase= 100
SendBase= 120
SendBase= 120
Sendbase= 100
TCP Retransmission Scenarios (more)
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
Cumulative ACK scenario
Host B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
Fast Retransmit
bull Time-out period often relatively longndash Long delay before
resending lost packet
bull Detect lost segments via duplicate ACKsndash Sender often sends
many segments back-to-back
ndash If segment lost there will likely be many duplicate ACKs for that segment
bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
Host A
timeo
ut
Host B
time
X
resend seq X2
seq x1seq x2seq x3seq x4seq x5
ACK x1
ACK x1ACK x1ACK x1
tripleduplicate
ACKs
Fast
Retra
nsm
it
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Flow Control
bull Receive side of TCP connection has a receive buffer
bull speed-matching service matching send rate to receiving applicationrsquos drain rate
bull App process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast
flow control
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
TCP Flow Control How it Works
(suppose TCP receiver discards out-of-order segments)
bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Receiver advertises unused buffer space by including rwnd value in segment header
bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos
buffer doesnrsquot overflow
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
rwndRcvBuffer
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control
info (eg RcvWindow)bull client connection
initiator Socket clientSocket = new
Socket(ldquohostnamerdquo port) bull server contacted by
client Socket connectionSocket =
welcomeSocketaccept()
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial
seq Step 3 client receives
SYNACK replies with ACK segment which may contain data
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer 3-81
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Principles of Congestion Control
Congestionbull Informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull Different from flow controlbull Manifestations
ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)
bull A ldquotop-10rdquo problem
Causescosts of Congestion Scenario 1
bull Two senders two receivers
bull One router infinite buffers
bull No retransmission
bull Large delays when congested
bull Maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
Causescosts of Congestion Scenario 2
bull One router finite buffers bull Sender retransmission of lost packet
finite shared output link
buffers
Host A
lin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of congestion Scenario 2
bull Always (goodput)
bull ldquoPerfectrdquo retransmission only when loss
bull Retransmission of delayed (not lost) packet makes
larger (than perfect case) for same
lin
lout
=
lin
lout
gtl
inl
out
ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of
pkt
R2
R2lin
l ou
t
b
R2
R2lin
l ou
t
a
R2
R2lin
l ou
t
c
R4
R3
Causescosts of Congestion Scenario 3
bull Four sendersbull Multihop pathsbull Timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of Congestion Scenario 3
Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Host A
Host B
lo
u
t
Approaches towards congestion control
End-end congestion control
bull No explicit feedback from network
bull Congestion inferred from end-system observed loss delay
bull Approach taken by TCP
Network-assisted congestion control
bull Routers provide feedback to end systemsndash Single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash Explicit rate sender should send at
Broadly
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Congestion Control
bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level
bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)
network not congested so increase sending rate
- lost segment assume loss due to congested network so decrease sending rate
TCP Congestion Control Bandwidth Probing
bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since
available bandwidth is changing depending on other connections in network)
ACKs being received so increase rate
X
X
XX
X loss so decrease rate
send
ing
rate
time
bull Q how fast to increasedecrease- details to follow
TCPrsquosldquosawtoothrdquobehavior
TCP Congestion Control detailsbull sender limits rate by limiting
number of unACKed bytes ldquoin pipelinerdquo
ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)
bull roughly
bull cwnd is dynamic function of perceived network congestion
rate = cwnd
RTT bytessec
LastByteSent-LastByteAcked cwnd
cwndbytes
RTT
ACK(s)
TCP Congestion Control more details
segment loss event reducing cwnd
bull timeout no response from receiverndash cut cwnd to 1
bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less
aggressively than on timeout
ACK received increase cwnd
bull slowstart phase - increase exponentially
fast (despite name) at connection start or following timeout
bull congestion avoidance - increase linearly
TCP Slow Start
bull when connection begins cwnd = 1 MSSndash example MSS = 500
bytes amp RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp
up to respectable ratebull increase rate exponentially
until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transitioning intoout of slowstart
ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2
ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to
congestion avoidance phase
slow start timeout
ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed
new ACKdupACKcount++
duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion
avoidance
TCP Congestion Avoidance
bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1
MSS per RTT ndash approach possible
congestion slower than in slowstart
ndash implementation cwnd = cwnd + MSScwnd for each ACK received
bull ACKs increase cwnd by 1 MSS per RTT additive increase
bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease
AIMD
AIMD Additive IncreaseMultiplicative Decrease
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
UDP User Datagram Protocol [RFC 768]
bull ldquono frillsrdquo ldquobare bonesrdquo Internet transport protocol
bull ldquobest effortrdquo service UDP segments may bendash lostndash delivered out of order
to appbull connectionless
ndash no handshaking between UDP sender receiver
ndash each UDP segment handled independently of others
Why is there a UDPbull no connection
establishment (which can add delay)
bull simple no connection state at sender receiver
bull small segment headerbull no congestion control
UDP can blast away as fast as desired
UDP more
bull Often used for streaming (videoaudio) or game appsndash loss tolerantndash rate sensitive
bull other UDP usesndash DNSndash SNMP
bull reliable transfer over UDP add reliability at application layerndash application-specific
error recovery
source port dest port
32 bits
Applicationdata (message)
UDP segment format
length checksumLength in
bytes of UDPsegmentincluding
header
UDP checksum
Senderbull treat segment contents
as sequence of 16-bit integers
bull checksum addition (1rsquos complement sum) of segment contents
bull sender puts checksum value into UDP checksum field
Receiverbull compute checksum of
received segmentbull check if computed
checksum equals checksum field valuendash NO - error detectedndash YES - no error detected
But maybe errors nonetheless More later hellip
Goal detect ldquoerrorsrdquo (eg flipped bits) in transmitted segment
Internet Checksum Example
bull Example add two 16-bit integers
1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1
wraparound
sum
checksum
bull At receiver add 2 integers and checksum hellip should be all 1rsquos If not bit error (correction next)
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Principles of Reliable data transferbull important in app transport link layers
bull top-10 list of important networking topics
bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
Principles of Reliable data transferbull important in app transport link layers
bull top-10 list of important networking topics
bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
Principles of Reliable Data Transferbull important in app transport link layersbull top-10 list of important networking topics
bull Characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
(Zoom
next
slid
e)
Reliable Data Transfer Getting Started
sendside
receiveside
rdt_send() called from above (eg by app) Passed data to deliver to receiver upper layer
udt_send() called by rdtto transfer packet over unreliable channel to receiver
rdt_rcv() called when packet arrives on rcv-side of channel
deliver_data() called by rdt to deliver data to upper
Reliable Data Transfer Getting Started
Wersquollbull Incrementally develop sender receiver
sides of reliable data transfer protocol (rdt)bull Consider only unidirectional data transfer
ndash but control info will flow on both directions
bull Use finite state machines (FSM) to specify sender receiver
state 1 state 2
event causing state transitionactions taken on state transition
state when in this ldquostaterdquo next state uniquely determined by next
event
eventactions
Rdt10 Reliable Transfer over a Reliable Channel
bull Underlying channel perfectly reliablendash no bit errorsndash no loss of packets
bull Separate FSMs for sender receiverndash sender sends data into underlying channelndash receiver read data from underlying channel
Wait for call from above packet = make_pkt(data)
udt_send(packet)
rdt_send(data)
extract (packetdata)deliver_data(data)
Wait for call from
below
rdt_rcv(packet)
sender receiverEasy
What if Taking a Message over Phone
bull Message is clearbull Message is garbled
What if Taking a Message over Phone
bull Message is clearndash Ok
bull Message is garbledndash Ask to repeatndash May not need whole message
bull In networks called Automatic Repeat reQuest (ARQ)ndash Need error detectionndash Receiver feedbackndash Retransmission
Rdt20 Channel with Bit Errors (no Loss)
bull Underlying channel may flip bits in packetndash Checksum to detect bit errors
bull The question how to recover from errorsndash acknowledgements (ACKs) receiver explicitly tells
sender that pkt received OKndash negative acknowledgements (NAKs) receiver
explicitly tells sender that pkt had errorsndash Sender retransmits pkt on receipt of NAK
bull New mechanisms in rdt20 (beyond rdt10)ndash Error detectionndash Receiver feedback control msgs (ACKNAK)
rcvrsender
Rdt20 FSM Specification
Wait for call from above
sndpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from below
sender
receiverrdt_send(data)
L
Rdt20 Operation with No Errors
Wait for call from above
snkpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from
below
rdt_send(data)
L
sender
receiver
Rdt20 Error Scenario
Wait for call from above
snkpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from
below
rdt_send(data)
L
sender
receiver
Sender sends one packet then waits for receiver response
stop and wait
Rdt20 Has a Fatal Flaw
Wait for call from above
sndpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from below
sender
receiverrdt_send(data)
L
Rdt20 Has a Fatal Flaw
What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicatebull How to handle duplicates
Rdt20 Has a Fatal Flaw
What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicate
Handling duplicates bull Sender retransmits
current pkt if ACKNAK garbled
bull Sender adds sequence number to each pktndash Can use 1 bit (for now)
bull receiver discards (doesnrsquot deliver up) duplicate pkt
Rdt21 Sender Handles Garbled ACKNAKs
Wait for call 0 from
above
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)
rdt_send(data)
Wait for ACK or NAK 0
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) ||isNAK(rcvpkt) )
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)
rdt_send(data)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)
Wait for call 1 from
above
Wait for ACK or NAK 1
LL
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)
Rdt21 Receiver Handles Garbled ACKNAKs
Wait for 0 from below
sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq0(rcvpkt)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
ampamp has_seq1(rcvpkt) extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
Wait for 1 from below
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq0(rcvpkt)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq1(rcvpkt)
sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
Rdt21 Discussion
Senderbull seq added to pktbull two seq rsquos (01) will sufficebull must check if received ACKNAK corrupted bull twice as many states
ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq
Receiverbull must check if received packet is duplicate
ndash state indicates whether 0 or 1 is expected pkt seq
bull note receiver can not know if its last ACKNAK received OK at sender
Rdt22 a NAK-free Protocol
bull Reduce type of response ACK onlybull Same functionality as rdt21 using ACKs onlybull Instead of NAK receiver sends ACK for last
pkt received OKndash receiver must explicitly include seq of pkt being
ACKed
bull Duplicate ACK at sender results in same action as NAK retransmit current pkt
rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))
udt_send(sndpkt)
Rdt22 Sender amp Receiver Fragments
Wait for call 0 from
above
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)
rdt_send(data)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
Wait for ACK 0
sender FSMfragment
Wait for 0 from below
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)
receiver FSMfragment
L
Rdt30 Channels with Errors and Loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of
help but not enough
How to determine if a packet is lost
Rdt30 Channels with Errors and Loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of
help but not enough
Approach sender waits ldquoreasonablerdquo amount of time for ACK
bull Retransmits if no ACK received in this time
bull If pkt (or ACK) just delayed (not lost)ndash Retransmission will be
duplicate but use of seq rsquos already handles this
ndash Receiver must specify seq of pkt being ACKed
bull Requires countdown timer
Rdt30 Sender
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer
rdt_send(data)
Wait for
ACK0
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )
Wait for call 1 from
above
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer
rdt_send(data)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)
stop_timerstop_timer
udt_send(sndpkt)start_timer
timeout
udt_send(sndpkt)start_timer
timeout
rdt_rcv(rcvpkt)
Wait for call 0from
above
Wait for
ACK1
Lrdt_rcv(rcvpkt)
LL
L
Rdt30 in Action
Rdt30 in Action
Performance of Rdt30
bull Rdt30 works but performance stinkshellipbull ex 1 Gbps link 15 ms prop delay 8000 bit
packet
U sender utilization ndash fraction of time sender busy sending
U sender
= 008
30008 = 000027
microseconds
L R
RTT + L R =
bull 1KB pkt every 30 msec -gt 33kBsec throughput over 1 Gbps link
bull Network protocol limits use of physical resources
dsmicrosecon8bps10
bits80009
R
Ldtrans
(Picture next slide)
Rdt30 Stop-and-Wait Operation
first packet bit transmitted t = 0
sender
receiver
RTT
last packet bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
U sender
= 008
30008 = 000027
microseconds
L R
RTT + L R =
Pipelined Protocols
Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash Range of sequence numbers must be increasedndash Need buffering at sender andor receiver
Pipelining Increased Utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
U sender
= 024
30008 = 00008
microseconds
3 L R
RTT + L R =
Increase utilizationby a factor of 3
bull Two generic forms of pipelined protocols go-Back-N selective repeat
Pipelining Protocols
Go-back-N overviewbull sender up to N
unACKed pkts in pipeline
bull receiver only sends cumulative ACKsndash doesnrsquot ACK pkt if
therersquos a gap
bull sender has timer for oldest unACKed pktndash if timer expires
retransmit all unACKed packets
Selective Repeat overviewbull sender up to N
unACKed packets in pipeline
bull receiver ACKs individual pkts
bull sender maintains timer for each unACKed pktndash if timer expires
retransmit only unACKed packet
Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unACKed pkts allowed
bull ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)
bull Timer for each in-flight pktbull Timeout(n) retransmit pkt n and all higher seq pkts in
window
GBN Sender Extended FSM
Waitstart_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])
timeout
rdt_send(data)
if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)
base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
base=1nextseqnum=1
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
L
GBN Receiver Extended FSM
ACK-only always send ACK for correctly-received pkt with highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum
bull out-of-order pkt ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK pkt with highest in-order seq
Wait
udt_send(sndpkt)
default
rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++
expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)
L
GBN inaction
GBN Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsgo-back-nindexhtml
Selective Repeatbull Receiver individually acknowledges all
correctly received pktsndash Buffers pkts as needed for eventual in-order
delivery to upper layer
bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt
bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts
Selective Repeat sender receiver windows
Selective Repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next
unACKed seq
senderpkt n in [rcvbase rcvbase+N-
1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also
deliver buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1]
bull ACK(n)
otherwise bull ignore
receiver
Transport Layer 3-52
Selective Repeat in Action
Selective Repeat
DilemmaExample bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
SR Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Overview RFCs 793 1122 1323 2018 2581
bull full duplex datandash bi-directional data flow
in same connectionndash MSS maximum
segment size
bull connection-oriented ndash handshaking
(exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not
overwhelm receiver
bull point-to-pointndash one sender one
receiver
bull reliable in-order byte steamndash no ldquomessage
boundariesrdquo
bull pipelinedndash TCP congestion and
flow control set window size
bull send amp receive buffers
socketdoor
T C Psend buffer
T C Preceive buffer
socketdoor
segm ent
applicationwrites data
applicationreads data
TCP Segment Structure
source port dest port
32 bits
applicationdata (variable length)
sequence number
acknowledgement number
Receive window
Urg data pointerchecksumFSRPAU
headlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
TCP Seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next
byte expected from other side
ndash cumulative ACKQ how receiver
handles out-of-order segmentsndash A TCP spec
doesnrsquot say up to implementer
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
UsertypeslsquoCrsquo
host ACKsreceipt of echoedlsquoCrsquo
timesimple telnet scenario
host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTT
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTTbull SampleRTT measured time
from segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
bull Exponential weighted moving averagebull influence of past sample decreases exponentially
fastbull typical value = 18th (or 0125)
Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety margin
bull First estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP reliable data transfer
bull TCP creates rdt service on top of IPrsquos unreliable service
bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs
bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control
TCP Sender EventsData rcvd from appbull Create segment with
seq bull seq is byte-stream
number of first data byte in segment
bull Start timer if not already running (think of timer as for oldest unACKed segment)
bull Expiration interval TimeOutInterval
Timeoutbull retransmit segment
that caused timeoutbull restart timer ACK rcvdbull If acknowledges
previously unACKed segmentsndash update what is known
to be ACKedndash start timer if there are
outstanding segments
TCP Sender
(simplified)
NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71
y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed
TCP Retransmission Scenarios
Host A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq
=92
tim
eout
ACK=120
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq
=92
tim
eout
SendBase= 100
SendBase= 120
SendBase= 120
Sendbase= 100
TCP Retransmission Scenarios (more)
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
Cumulative ACK scenario
Host B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
Fast Retransmit
bull Time-out period often relatively longndash Long delay before
resending lost packet
bull Detect lost segments via duplicate ACKsndash Sender often sends
many segments back-to-back
ndash If segment lost there will likely be many duplicate ACKs for that segment
bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
Host A
timeo
ut
Host B
time
X
resend seq X2
seq x1seq x2seq x3seq x4seq x5
ACK x1
ACK x1ACK x1ACK x1
tripleduplicate
ACKs
Fast
Retra
nsm
it
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Flow Control
bull Receive side of TCP connection has a receive buffer
bull speed-matching service matching send rate to receiving applicationrsquos drain rate
bull App process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast
flow control
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
TCP Flow Control How it Works
(suppose TCP receiver discards out-of-order segments)
bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Receiver advertises unused buffer space by including rwnd value in segment header
bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos
buffer doesnrsquot overflow
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
rwndRcvBuffer
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control
info (eg RcvWindow)bull client connection
initiator Socket clientSocket = new
Socket(ldquohostnamerdquo port) bull server contacted by
client Socket connectionSocket =
welcomeSocketaccept()
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial
seq Step 3 client receives
SYNACK replies with ACK segment which may contain data
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer 3-81
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Principles of Congestion Control
Congestionbull Informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull Different from flow controlbull Manifestations
ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)
bull A ldquotop-10rdquo problem
Causescosts of Congestion Scenario 1
bull Two senders two receivers
bull One router infinite buffers
bull No retransmission
bull Large delays when congested
bull Maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
Causescosts of Congestion Scenario 2
bull One router finite buffers bull Sender retransmission of lost packet
finite shared output link
buffers
Host A
lin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of congestion Scenario 2
bull Always (goodput)
bull ldquoPerfectrdquo retransmission only when loss
bull Retransmission of delayed (not lost) packet makes
larger (than perfect case) for same
lin
lout
=
lin
lout
gtl
inl
out
ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of
pkt
R2
R2lin
l ou
t
b
R2
R2lin
l ou
t
a
R2
R2lin
l ou
t
c
R4
R3
Causescosts of Congestion Scenario 3
bull Four sendersbull Multihop pathsbull Timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of Congestion Scenario 3
Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Host A
Host B
lo
u
t
Approaches towards congestion control
End-end congestion control
bull No explicit feedback from network
bull Congestion inferred from end-system observed loss delay
bull Approach taken by TCP
Network-assisted congestion control
bull Routers provide feedback to end systemsndash Single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash Explicit rate sender should send at
Broadly
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Congestion Control
bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level
bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)
network not congested so increase sending rate
- lost segment assume loss due to congested network so decrease sending rate
TCP Congestion Control Bandwidth Probing
bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since
available bandwidth is changing depending on other connections in network)
ACKs being received so increase rate
X
X
XX
X loss so decrease rate
send
ing
rate
time
bull Q how fast to increasedecrease- details to follow
TCPrsquosldquosawtoothrdquobehavior
TCP Congestion Control detailsbull sender limits rate by limiting
number of unACKed bytes ldquoin pipelinerdquo
ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)
bull roughly
bull cwnd is dynamic function of perceived network congestion
rate = cwnd
RTT bytessec
LastByteSent-LastByteAcked cwnd
cwndbytes
RTT
ACK(s)
TCP Congestion Control more details
segment loss event reducing cwnd
bull timeout no response from receiverndash cut cwnd to 1
bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less
aggressively than on timeout
ACK received increase cwnd
bull slowstart phase - increase exponentially
fast (despite name) at connection start or following timeout
bull congestion avoidance - increase linearly
TCP Slow Start
bull when connection begins cwnd = 1 MSSndash example MSS = 500
bytes amp RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp
up to respectable ratebull increase rate exponentially
until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transitioning intoout of slowstart
ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2
ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to
congestion avoidance phase
slow start timeout
ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed
new ACKdupACKcount++
duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion
avoidance
TCP Congestion Avoidance
bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1
MSS per RTT ndash approach possible
congestion slower than in slowstart
ndash implementation cwnd = cwnd + MSScwnd for each ACK received
bull ACKs increase cwnd by 1 MSS per RTT additive increase
bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease
AIMD
AIMD Additive IncreaseMultiplicative Decrease
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
UDP User Datagram Protocol [RFC 768]
bull ldquono frillsrdquo ldquobare bonesrdquo Internet transport protocol
bull ldquobest effortrdquo service UDP segments may bendash lostndash delivered out of order
to appbull connectionless
ndash no handshaking between UDP sender receiver
ndash each UDP segment handled independently of others
Why is there a UDPbull no connection
establishment (which can add delay)
bull simple no connection state at sender receiver
bull small segment headerbull no congestion control
UDP can blast away as fast as desired
UDP more
bull Often used for streaming (videoaudio) or game appsndash loss tolerantndash rate sensitive
bull other UDP usesndash DNSndash SNMP
bull reliable transfer over UDP add reliability at application layerndash application-specific
error recovery
source port dest port
32 bits
Applicationdata (message)
UDP segment format
length checksumLength in
bytes of UDPsegmentincluding
header
UDP checksum
Senderbull treat segment contents
as sequence of 16-bit integers
bull checksum addition (1rsquos complement sum) of segment contents
bull sender puts checksum value into UDP checksum field
Receiverbull compute checksum of
received segmentbull check if computed
checksum equals checksum field valuendash NO - error detectedndash YES - no error detected
But maybe errors nonetheless More later hellip
Goal detect ldquoerrorsrdquo (eg flipped bits) in transmitted segment
Internet Checksum Example
bull Example add two 16-bit integers
1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1
wraparound
sum
checksum
bull At receiver add 2 integers and checksum hellip should be all 1rsquos If not bit error (correction next)
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Principles of Reliable data transferbull important in app transport link layers
bull top-10 list of important networking topics
bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
Principles of Reliable data transferbull important in app transport link layers
bull top-10 list of important networking topics
bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
Principles of Reliable Data Transferbull important in app transport link layersbull top-10 list of important networking topics
bull Characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
(Zoom
next
slid
e)
Reliable Data Transfer Getting Started
sendside
receiveside
rdt_send() called from above (eg by app) Passed data to deliver to receiver upper layer
udt_send() called by rdtto transfer packet over unreliable channel to receiver
rdt_rcv() called when packet arrives on rcv-side of channel
deliver_data() called by rdt to deliver data to upper
Reliable Data Transfer Getting Started
Wersquollbull Incrementally develop sender receiver
sides of reliable data transfer protocol (rdt)bull Consider only unidirectional data transfer
ndash but control info will flow on both directions
bull Use finite state machines (FSM) to specify sender receiver
state 1 state 2
event causing state transitionactions taken on state transition
state when in this ldquostaterdquo next state uniquely determined by next
event
eventactions
Rdt10 Reliable Transfer over a Reliable Channel
bull Underlying channel perfectly reliablendash no bit errorsndash no loss of packets
bull Separate FSMs for sender receiverndash sender sends data into underlying channelndash receiver read data from underlying channel
Wait for call from above packet = make_pkt(data)
udt_send(packet)
rdt_send(data)
extract (packetdata)deliver_data(data)
Wait for call from
below
rdt_rcv(packet)
sender receiverEasy
What if Taking a Message over Phone
bull Message is clearbull Message is garbled
What if Taking a Message over Phone
bull Message is clearndash Ok
bull Message is garbledndash Ask to repeatndash May not need whole message
bull In networks called Automatic Repeat reQuest (ARQ)ndash Need error detectionndash Receiver feedbackndash Retransmission
Rdt20 Channel with Bit Errors (no Loss)
bull Underlying channel may flip bits in packetndash Checksum to detect bit errors
bull The question how to recover from errorsndash acknowledgements (ACKs) receiver explicitly tells
sender that pkt received OKndash negative acknowledgements (NAKs) receiver
explicitly tells sender that pkt had errorsndash Sender retransmits pkt on receipt of NAK
bull New mechanisms in rdt20 (beyond rdt10)ndash Error detectionndash Receiver feedback control msgs (ACKNAK)
rcvrsender
Rdt20 FSM Specification
Wait for call from above
sndpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from below
sender
receiverrdt_send(data)
L
Rdt20 Operation with No Errors
Wait for call from above
snkpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from
below
rdt_send(data)
L
sender
receiver
Rdt20 Error Scenario
Wait for call from above
snkpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from
below
rdt_send(data)
L
sender
receiver
Sender sends one packet then waits for receiver response
stop and wait
Rdt20 Has a Fatal Flaw
Wait for call from above
sndpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from below
sender
receiverrdt_send(data)
L
Rdt20 Has a Fatal Flaw
What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicatebull How to handle duplicates
Rdt20 Has a Fatal Flaw
What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicate
Handling duplicates bull Sender retransmits
current pkt if ACKNAK garbled
bull Sender adds sequence number to each pktndash Can use 1 bit (for now)
bull receiver discards (doesnrsquot deliver up) duplicate pkt
Rdt21 Sender Handles Garbled ACKNAKs
Wait for call 0 from
above
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)
rdt_send(data)
Wait for ACK or NAK 0
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) ||isNAK(rcvpkt) )
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)
rdt_send(data)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)
Wait for call 1 from
above
Wait for ACK or NAK 1
LL
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)
Rdt21 Receiver Handles Garbled ACKNAKs
Wait for 0 from below
sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq0(rcvpkt)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
ampamp has_seq1(rcvpkt) extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
Wait for 1 from below
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq0(rcvpkt)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq1(rcvpkt)
sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
Rdt21 Discussion
Senderbull seq added to pktbull two seq rsquos (01) will sufficebull must check if received ACKNAK corrupted bull twice as many states
ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq
Receiverbull must check if received packet is duplicate
ndash state indicates whether 0 or 1 is expected pkt seq
bull note receiver can not know if its last ACKNAK received OK at sender
Rdt22 a NAK-free Protocol
bull Reduce type of response ACK onlybull Same functionality as rdt21 using ACKs onlybull Instead of NAK receiver sends ACK for last
pkt received OKndash receiver must explicitly include seq of pkt being
ACKed
bull Duplicate ACK at sender results in same action as NAK retransmit current pkt
rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))
udt_send(sndpkt)
Rdt22 Sender amp Receiver Fragments
Wait for call 0 from
above
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)
rdt_send(data)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
Wait for ACK 0
sender FSMfragment
Wait for 0 from below
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)
receiver FSMfragment
L
Rdt30 Channels with Errors and Loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of
help but not enough
How to determine if a packet is lost
Rdt30 Channels with Errors and Loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of
help but not enough
Approach sender waits ldquoreasonablerdquo amount of time for ACK
bull Retransmits if no ACK received in this time
bull If pkt (or ACK) just delayed (not lost)ndash Retransmission will be
duplicate but use of seq rsquos already handles this
ndash Receiver must specify seq of pkt being ACKed
bull Requires countdown timer
Rdt30 Sender
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer
rdt_send(data)
Wait for
ACK0
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )
Wait for call 1 from
above
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer
rdt_send(data)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)
stop_timerstop_timer
udt_send(sndpkt)start_timer
timeout
udt_send(sndpkt)start_timer
timeout
rdt_rcv(rcvpkt)
Wait for call 0from
above
Wait for
ACK1
Lrdt_rcv(rcvpkt)
LL
L
Rdt30 in Action
Rdt30 in Action
Performance of Rdt30
bull Rdt30 works but performance stinkshellipbull ex 1 Gbps link 15 ms prop delay 8000 bit
packet
U sender utilization ndash fraction of time sender busy sending
U sender
= 008
30008 = 000027
microseconds
L R
RTT + L R =
bull 1KB pkt every 30 msec -gt 33kBsec throughput over 1 Gbps link
bull Network protocol limits use of physical resources
dsmicrosecon8bps10
bits80009
R
Ldtrans
(Picture next slide)
Rdt30 Stop-and-Wait Operation
first packet bit transmitted t = 0
sender
receiver
RTT
last packet bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
U sender
= 008
30008 = 000027
microseconds
L R
RTT + L R =
Pipelined Protocols
Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash Range of sequence numbers must be increasedndash Need buffering at sender andor receiver
Pipelining Increased Utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
U sender
= 024
30008 = 00008
microseconds
3 L R
RTT + L R =
Increase utilizationby a factor of 3
bull Two generic forms of pipelined protocols go-Back-N selective repeat
Pipelining Protocols
Go-back-N overviewbull sender up to N
unACKed pkts in pipeline
bull receiver only sends cumulative ACKsndash doesnrsquot ACK pkt if
therersquos a gap
bull sender has timer for oldest unACKed pktndash if timer expires
retransmit all unACKed packets
Selective Repeat overviewbull sender up to N
unACKed packets in pipeline
bull receiver ACKs individual pkts
bull sender maintains timer for each unACKed pktndash if timer expires
retransmit only unACKed packet
Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unACKed pkts allowed
bull ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)
bull Timer for each in-flight pktbull Timeout(n) retransmit pkt n and all higher seq pkts in
window
GBN Sender Extended FSM
Waitstart_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])
timeout
rdt_send(data)
if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)
base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
base=1nextseqnum=1
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
L
GBN Receiver Extended FSM
ACK-only always send ACK for correctly-received pkt with highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum
bull out-of-order pkt ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK pkt with highest in-order seq
Wait
udt_send(sndpkt)
default
rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++
expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)
L
GBN inaction
GBN Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsgo-back-nindexhtml
Selective Repeatbull Receiver individually acknowledges all
correctly received pktsndash Buffers pkts as needed for eventual in-order
delivery to upper layer
bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt
bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts
Selective Repeat sender receiver windows
Selective Repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next
unACKed seq
senderpkt n in [rcvbase rcvbase+N-
1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also
deliver buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1]
bull ACK(n)
otherwise bull ignore
receiver
Transport Layer 3-52
Selective Repeat in Action
Selective Repeat
DilemmaExample bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
SR Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Overview RFCs 793 1122 1323 2018 2581
bull full duplex datandash bi-directional data flow
in same connectionndash MSS maximum
segment size
bull connection-oriented ndash handshaking
(exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not
overwhelm receiver
bull point-to-pointndash one sender one
receiver
bull reliable in-order byte steamndash no ldquomessage
boundariesrdquo
bull pipelinedndash TCP congestion and
flow control set window size
bull send amp receive buffers
socketdoor
T C Psend buffer
T C Preceive buffer
socketdoor
segm ent
applicationwrites data
applicationreads data
TCP Segment Structure
source port dest port
32 bits
applicationdata (variable length)
sequence number
acknowledgement number
Receive window
Urg data pointerchecksumFSRPAU
headlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
TCP Seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next
byte expected from other side
ndash cumulative ACKQ how receiver
handles out-of-order segmentsndash A TCP spec
doesnrsquot say up to implementer
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
UsertypeslsquoCrsquo
host ACKsreceipt of echoedlsquoCrsquo
timesimple telnet scenario
host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTT
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTTbull SampleRTT measured time
from segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
bull Exponential weighted moving averagebull influence of past sample decreases exponentially
fastbull typical value = 18th (or 0125)
Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety margin
bull First estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP reliable data transfer
bull TCP creates rdt service on top of IPrsquos unreliable service
bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs
bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control
TCP Sender EventsData rcvd from appbull Create segment with
seq bull seq is byte-stream
number of first data byte in segment
bull Start timer if not already running (think of timer as for oldest unACKed segment)
bull Expiration interval TimeOutInterval
Timeoutbull retransmit segment
that caused timeoutbull restart timer ACK rcvdbull If acknowledges
previously unACKed segmentsndash update what is known
to be ACKedndash start timer if there are
outstanding segments
TCP Sender
(simplified)
NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71
y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed
TCP Retransmission Scenarios
Host A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq
=92
tim
eout
ACK=120
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq
=92
tim
eout
SendBase= 100
SendBase= 120
SendBase= 120
Sendbase= 100
TCP Retransmission Scenarios (more)
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
Cumulative ACK scenario
Host B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
Fast Retransmit
bull Time-out period often relatively longndash Long delay before
resending lost packet
bull Detect lost segments via duplicate ACKsndash Sender often sends
many segments back-to-back
ndash If segment lost there will likely be many duplicate ACKs for that segment
bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
Host A
timeo
ut
Host B
time
X
resend seq X2
seq x1seq x2seq x3seq x4seq x5
ACK x1
ACK x1ACK x1ACK x1
tripleduplicate
ACKs
Fast
Retra
nsm
it
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Flow Control
bull Receive side of TCP connection has a receive buffer
bull speed-matching service matching send rate to receiving applicationrsquos drain rate
bull App process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast
flow control
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
TCP Flow Control How it Works
(suppose TCP receiver discards out-of-order segments)
bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Receiver advertises unused buffer space by including rwnd value in segment header
bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos
buffer doesnrsquot overflow
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
rwndRcvBuffer
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control
info (eg RcvWindow)bull client connection
initiator Socket clientSocket = new
Socket(ldquohostnamerdquo port) bull server contacted by
client Socket connectionSocket =
welcomeSocketaccept()
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial
seq Step 3 client receives
SYNACK replies with ACK segment which may contain data
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer 3-81
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Principles of Congestion Control
Congestionbull Informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull Different from flow controlbull Manifestations
ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)
bull A ldquotop-10rdquo problem
Causescosts of Congestion Scenario 1
bull Two senders two receivers
bull One router infinite buffers
bull No retransmission
bull Large delays when congested
bull Maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
Causescosts of Congestion Scenario 2
bull One router finite buffers bull Sender retransmission of lost packet
finite shared output link
buffers
Host A
lin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of congestion Scenario 2
bull Always (goodput)
bull ldquoPerfectrdquo retransmission only when loss
bull Retransmission of delayed (not lost) packet makes
larger (than perfect case) for same
lin
lout
=
lin
lout
gtl
inl
out
ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of
pkt
R2
R2lin
l ou
t
b
R2
R2lin
l ou
t
a
R2
R2lin
l ou
t
c
R4
R3
Causescosts of Congestion Scenario 3
bull Four sendersbull Multihop pathsbull Timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of Congestion Scenario 3
Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Host A
Host B
lo
u
t
Approaches towards congestion control
End-end congestion control
bull No explicit feedback from network
bull Congestion inferred from end-system observed loss delay
bull Approach taken by TCP
Network-assisted congestion control
bull Routers provide feedback to end systemsndash Single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash Explicit rate sender should send at
Broadly
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Congestion Control
bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level
bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)
network not congested so increase sending rate
- lost segment assume loss due to congested network so decrease sending rate
TCP Congestion Control Bandwidth Probing
bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since
available bandwidth is changing depending on other connections in network)
ACKs being received so increase rate
X
X
XX
X loss so decrease rate
send
ing
rate
time
bull Q how fast to increasedecrease- details to follow
TCPrsquosldquosawtoothrdquobehavior
TCP Congestion Control detailsbull sender limits rate by limiting
number of unACKed bytes ldquoin pipelinerdquo
ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)
bull roughly
bull cwnd is dynamic function of perceived network congestion
rate = cwnd
RTT bytessec
LastByteSent-LastByteAcked cwnd
cwndbytes
RTT
ACK(s)
TCP Congestion Control more details
segment loss event reducing cwnd
bull timeout no response from receiverndash cut cwnd to 1
bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less
aggressively than on timeout
ACK received increase cwnd
bull slowstart phase - increase exponentially
fast (despite name) at connection start or following timeout
bull congestion avoidance - increase linearly
TCP Slow Start
bull when connection begins cwnd = 1 MSSndash example MSS = 500
bytes amp RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp
up to respectable ratebull increase rate exponentially
until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transitioning intoout of slowstart
ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2
ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to
congestion avoidance phase
slow start timeout
ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed
new ACKdupACKcount++
duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion
avoidance
TCP Congestion Avoidance
bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1
MSS per RTT ndash approach possible
congestion slower than in slowstart
ndash implementation cwnd = cwnd + MSScwnd for each ACK received
bull ACKs increase cwnd by 1 MSS per RTT additive increase
bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease
AIMD
AIMD Additive IncreaseMultiplicative Decrease
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
UDP more
bull Often used for streaming (videoaudio) or game appsndash loss tolerantndash rate sensitive
bull other UDP usesndash DNSndash SNMP
bull reliable transfer over UDP add reliability at application layerndash application-specific
error recovery
source port dest port
32 bits
Applicationdata (message)
UDP segment format
length checksumLength in
bytes of UDPsegmentincluding
header
UDP checksum
Senderbull treat segment contents
as sequence of 16-bit integers
bull checksum addition (1rsquos complement sum) of segment contents
bull sender puts checksum value into UDP checksum field
Receiverbull compute checksum of
received segmentbull check if computed
checksum equals checksum field valuendash NO - error detectedndash YES - no error detected
But maybe errors nonetheless More later hellip
Goal detect ldquoerrorsrdquo (eg flipped bits) in transmitted segment
Internet Checksum Example
bull Example add two 16-bit integers
1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1
wraparound
sum
checksum
bull At receiver add 2 integers and checksum hellip should be all 1rsquos If not bit error (correction next)
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Principles of Reliable data transferbull important in app transport link layers
bull top-10 list of important networking topics
bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
Principles of Reliable data transferbull important in app transport link layers
bull top-10 list of important networking topics
bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
Principles of Reliable Data Transferbull important in app transport link layersbull top-10 list of important networking topics
bull Characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
(Zoom
next
slid
e)
Reliable Data Transfer Getting Started
sendside
receiveside
rdt_send() called from above (eg by app) Passed data to deliver to receiver upper layer
udt_send() called by rdtto transfer packet over unreliable channel to receiver
rdt_rcv() called when packet arrives on rcv-side of channel
deliver_data() called by rdt to deliver data to upper
Reliable Data Transfer Getting Started
Wersquollbull Incrementally develop sender receiver
sides of reliable data transfer protocol (rdt)bull Consider only unidirectional data transfer
ndash but control info will flow on both directions
bull Use finite state machines (FSM) to specify sender receiver
state 1 state 2
event causing state transitionactions taken on state transition
state when in this ldquostaterdquo next state uniquely determined by next
event
eventactions
Rdt10 Reliable Transfer over a Reliable Channel
bull Underlying channel perfectly reliablendash no bit errorsndash no loss of packets
bull Separate FSMs for sender receiverndash sender sends data into underlying channelndash receiver read data from underlying channel
Wait for call from above packet = make_pkt(data)
udt_send(packet)
rdt_send(data)
extract (packetdata)deliver_data(data)
Wait for call from
below
rdt_rcv(packet)
sender receiverEasy
What if Taking a Message over Phone
bull Message is clearbull Message is garbled
What if Taking a Message over Phone
bull Message is clearndash Ok
bull Message is garbledndash Ask to repeatndash May not need whole message
bull In networks called Automatic Repeat reQuest (ARQ)ndash Need error detectionndash Receiver feedbackndash Retransmission
Rdt20 Channel with Bit Errors (no Loss)
bull Underlying channel may flip bits in packetndash Checksum to detect bit errors
bull The question how to recover from errorsndash acknowledgements (ACKs) receiver explicitly tells
sender that pkt received OKndash negative acknowledgements (NAKs) receiver
explicitly tells sender that pkt had errorsndash Sender retransmits pkt on receipt of NAK
bull New mechanisms in rdt20 (beyond rdt10)ndash Error detectionndash Receiver feedback control msgs (ACKNAK)
rcvrsender
Rdt20 FSM Specification
Wait for call from above
sndpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from below
sender
receiverrdt_send(data)
L
Rdt20 Operation with No Errors
Wait for call from above
snkpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from
below
rdt_send(data)
L
sender
receiver
Rdt20 Error Scenario
Wait for call from above
snkpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from
below
rdt_send(data)
L
sender
receiver
Sender sends one packet then waits for receiver response
stop and wait
Rdt20 Has a Fatal Flaw
Wait for call from above
sndpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from below
sender
receiverrdt_send(data)
L
Rdt20 Has a Fatal Flaw
What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicatebull How to handle duplicates
Rdt20 Has a Fatal Flaw
What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicate
Handling duplicates bull Sender retransmits
current pkt if ACKNAK garbled
bull Sender adds sequence number to each pktndash Can use 1 bit (for now)
bull receiver discards (doesnrsquot deliver up) duplicate pkt
Rdt21 Sender Handles Garbled ACKNAKs
Wait for call 0 from
above
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)
rdt_send(data)
Wait for ACK or NAK 0
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) ||isNAK(rcvpkt) )
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)
rdt_send(data)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)
Wait for call 1 from
above
Wait for ACK or NAK 1
LL
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)
Rdt21 Receiver Handles Garbled ACKNAKs
Wait for 0 from below
sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq0(rcvpkt)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
ampamp has_seq1(rcvpkt) extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
Wait for 1 from below
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq0(rcvpkt)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq1(rcvpkt)
sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
Rdt21 Discussion
Senderbull seq added to pktbull two seq rsquos (01) will sufficebull must check if received ACKNAK corrupted bull twice as many states
ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq
Receiverbull must check if received packet is duplicate
ndash state indicates whether 0 or 1 is expected pkt seq
bull note receiver can not know if its last ACKNAK received OK at sender
Rdt22 a NAK-free Protocol
bull Reduce type of response ACK onlybull Same functionality as rdt21 using ACKs onlybull Instead of NAK receiver sends ACK for last
pkt received OKndash receiver must explicitly include seq of pkt being
ACKed
bull Duplicate ACK at sender results in same action as NAK retransmit current pkt
rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))
udt_send(sndpkt)
Rdt22 Sender amp Receiver Fragments
Wait for call 0 from
above
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)
rdt_send(data)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
Wait for ACK 0
sender FSMfragment
Wait for 0 from below
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)
receiver FSMfragment
L
Rdt30 Channels with Errors and Loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of
help but not enough
How to determine if a packet is lost
Rdt30 Channels with Errors and Loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of
help but not enough
Approach sender waits ldquoreasonablerdquo amount of time for ACK
bull Retransmits if no ACK received in this time
bull If pkt (or ACK) just delayed (not lost)ndash Retransmission will be
duplicate but use of seq rsquos already handles this
ndash Receiver must specify seq of pkt being ACKed
bull Requires countdown timer
Rdt30 Sender
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer
rdt_send(data)
Wait for
ACK0
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )
Wait for call 1 from
above
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer
rdt_send(data)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)
stop_timerstop_timer
udt_send(sndpkt)start_timer
timeout
udt_send(sndpkt)start_timer
timeout
rdt_rcv(rcvpkt)
Wait for call 0from
above
Wait for
ACK1
Lrdt_rcv(rcvpkt)
LL
L
Rdt30 in Action
Rdt30 in Action
Performance of Rdt30
bull Rdt30 works but performance stinkshellipbull ex 1 Gbps link 15 ms prop delay 8000 bit
packet
U sender utilization ndash fraction of time sender busy sending
U sender
= 008
30008 = 000027
microseconds
L R
RTT + L R =
bull 1KB pkt every 30 msec -gt 33kBsec throughput over 1 Gbps link
bull Network protocol limits use of physical resources
dsmicrosecon8bps10
bits80009
R
Ldtrans
(Picture next slide)
Rdt30 Stop-and-Wait Operation
first packet bit transmitted t = 0
sender
receiver
RTT
last packet bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
U sender
= 008
30008 = 000027
microseconds
L R
RTT + L R =
Pipelined Protocols
Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash Range of sequence numbers must be increasedndash Need buffering at sender andor receiver
Pipelining Increased Utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
U sender
= 024
30008 = 00008
microseconds
3 L R
RTT + L R =
Increase utilizationby a factor of 3
bull Two generic forms of pipelined protocols go-Back-N selective repeat
Pipelining Protocols
Go-back-N overviewbull sender up to N
unACKed pkts in pipeline
bull receiver only sends cumulative ACKsndash doesnrsquot ACK pkt if
therersquos a gap
bull sender has timer for oldest unACKed pktndash if timer expires
retransmit all unACKed packets
Selective Repeat overviewbull sender up to N
unACKed packets in pipeline
bull receiver ACKs individual pkts
bull sender maintains timer for each unACKed pktndash if timer expires
retransmit only unACKed packet
Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unACKed pkts allowed
bull ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)
bull Timer for each in-flight pktbull Timeout(n) retransmit pkt n and all higher seq pkts in
window
GBN Sender Extended FSM
Waitstart_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])
timeout
rdt_send(data)
if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)
base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
base=1nextseqnum=1
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
L
GBN Receiver Extended FSM
ACK-only always send ACK for correctly-received pkt with highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum
bull out-of-order pkt ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK pkt with highest in-order seq
Wait
udt_send(sndpkt)
default
rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++
expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)
L
GBN inaction
GBN Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsgo-back-nindexhtml
Selective Repeatbull Receiver individually acknowledges all
correctly received pktsndash Buffers pkts as needed for eventual in-order
delivery to upper layer
bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt
bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts
Selective Repeat sender receiver windows
Selective Repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next
unACKed seq
senderpkt n in [rcvbase rcvbase+N-
1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also
deliver buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1]
bull ACK(n)
otherwise bull ignore
receiver
Transport Layer 3-52
Selective Repeat in Action
Selective Repeat
DilemmaExample bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
SR Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Overview RFCs 793 1122 1323 2018 2581
bull full duplex datandash bi-directional data flow
in same connectionndash MSS maximum
segment size
bull connection-oriented ndash handshaking
(exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not
overwhelm receiver
bull point-to-pointndash one sender one
receiver
bull reliable in-order byte steamndash no ldquomessage
boundariesrdquo
bull pipelinedndash TCP congestion and
flow control set window size
bull send amp receive buffers
socketdoor
T C Psend buffer
T C Preceive buffer
socketdoor
segm ent
applicationwrites data
applicationreads data
TCP Segment Structure
source port dest port
32 bits
applicationdata (variable length)
sequence number
acknowledgement number
Receive window
Urg data pointerchecksumFSRPAU
headlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
TCP Seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next
byte expected from other side
ndash cumulative ACKQ how receiver
handles out-of-order segmentsndash A TCP spec
doesnrsquot say up to implementer
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
UsertypeslsquoCrsquo
host ACKsreceipt of echoedlsquoCrsquo
timesimple telnet scenario
host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTT
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTTbull SampleRTT measured time
from segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
bull Exponential weighted moving averagebull influence of past sample decreases exponentially
fastbull typical value = 18th (or 0125)
Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety margin
bull First estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP reliable data transfer
bull TCP creates rdt service on top of IPrsquos unreliable service
bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs
bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control
TCP Sender EventsData rcvd from appbull Create segment with
seq bull seq is byte-stream
number of first data byte in segment
bull Start timer if not already running (think of timer as for oldest unACKed segment)
bull Expiration interval TimeOutInterval
Timeoutbull retransmit segment
that caused timeoutbull restart timer ACK rcvdbull If acknowledges
previously unACKed segmentsndash update what is known
to be ACKedndash start timer if there are
outstanding segments
TCP Sender
(simplified)
NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71
y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed
TCP Retransmission Scenarios
Host A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq
=92
tim
eout
ACK=120
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq
=92
tim
eout
SendBase= 100
SendBase= 120
SendBase= 120
Sendbase= 100
TCP Retransmission Scenarios (more)
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
Cumulative ACK scenario
Host B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
Fast Retransmit
bull Time-out period often relatively longndash Long delay before
resending lost packet
bull Detect lost segments via duplicate ACKsndash Sender often sends
many segments back-to-back
ndash If segment lost there will likely be many duplicate ACKs for that segment
bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
Host A
timeo
ut
Host B
time
X
resend seq X2
seq x1seq x2seq x3seq x4seq x5
ACK x1
ACK x1ACK x1ACK x1
tripleduplicate
ACKs
Fast
Retra
nsm
it
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Flow Control
bull Receive side of TCP connection has a receive buffer
bull speed-matching service matching send rate to receiving applicationrsquos drain rate
bull App process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast
flow control
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
TCP Flow Control How it Works
(suppose TCP receiver discards out-of-order segments)
bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Receiver advertises unused buffer space by including rwnd value in segment header
bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos
buffer doesnrsquot overflow
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
rwndRcvBuffer
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control
info (eg RcvWindow)bull client connection
initiator Socket clientSocket = new
Socket(ldquohostnamerdquo port) bull server contacted by
client Socket connectionSocket =
welcomeSocketaccept()
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial
seq Step 3 client receives
SYNACK replies with ACK segment which may contain data
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer 3-81
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Principles of Congestion Control
Congestionbull Informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull Different from flow controlbull Manifestations
ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)
bull A ldquotop-10rdquo problem
Causescosts of Congestion Scenario 1
bull Two senders two receivers
bull One router infinite buffers
bull No retransmission
bull Large delays when congested
bull Maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
Causescosts of Congestion Scenario 2
bull One router finite buffers bull Sender retransmission of lost packet
finite shared output link
buffers
Host A
lin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of congestion Scenario 2
bull Always (goodput)
bull ldquoPerfectrdquo retransmission only when loss
bull Retransmission of delayed (not lost) packet makes
larger (than perfect case) for same
lin
lout
=
lin
lout
gtl
inl
out
ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of
pkt
R2
R2lin
l ou
t
b
R2
R2lin
l ou
t
a
R2
R2lin
l ou
t
c
R4
R3
Causescosts of Congestion Scenario 3
bull Four sendersbull Multihop pathsbull Timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of Congestion Scenario 3
Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Host A
Host B
lo
u
t
Approaches towards congestion control
End-end congestion control
bull No explicit feedback from network
bull Congestion inferred from end-system observed loss delay
bull Approach taken by TCP
Network-assisted congestion control
bull Routers provide feedback to end systemsndash Single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash Explicit rate sender should send at
Broadly
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Congestion Control
bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level
bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)
network not congested so increase sending rate
- lost segment assume loss due to congested network so decrease sending rate
TCP Congestion Control Bandwidth Probing
bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since
available bandwidth is changing depending on other connections in network)
ACKs being received so increase rate
X
X
XX
X loss so decrease rate
send
ing
rate
time
bull Q how fast to increasedecrease- details to follow
TCPrsquosldquosawtoothrdquobehavior
TCP Congestion Control detailsbull sender limits rate by limiting
number of unACKed bytes ldquoin pipelinerdquo
ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)
bull roughly
bull cwnd is dynamic function of perceived network congestion
rate = cwnd
RTT bytessec
LastByteSent-LastByteAcked cwnd
cwndbytes
RTT
ACK(s)
TCP Congestion Control more details
segment loss event reducing cwnd
bull timeout no response from receiverndash cut cwnd to 1
bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less
aggressively than on timeout
ACK received increase cwnd
bull slowstart phase - increase exponentially
fast (despite name) at connection start or following timeout
bull congestion avoidance - increase linearly
TCP Slow Start
bull when connection begins cwnd = 1 MSSndash example MSS = 500
bytes amp RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp
up to respectable ratebull increase rate exponentially
until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transitioning intoout of slowstart
ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2
ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to
congestion avoidance phase
slow start timeout
ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed
new ACKdupACKcount++
duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion
avoidance
TCP Congestion Avoidance
bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1
MSS per RTT ndash approach possible
congestion slower than in slowstart
ndash implementation cwnd = cwnd + MSScwnd for each ACK received
bull ACKs increase cwnd by 1 MSS per RTT additive increase
bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease
AIMD
AIMD Additive IncreaseMultiplicative Decrease
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
UDP checksum
Senderbull treat segment contents
as sequence of 16-bit integers
bull checksum addition (1rsquos complement sum) of segment contents
bull sender puts checksum value into UDP checksum field
Receiverbull compute checksum of
received segmentbull check if computed
checksum equals checksum field valuendash NO - error detectedndash YES - no error detected
But maybe errors nonetheless More later hellip
Goal detect ldquoerrorsrdquo (eg flipped bits) in transmitted segment
Internet Checksum Example
bull Example add two 16-bit integers
1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1
wraparound
sum
checksum
bull At receiver add 2 integers and checksum hellip should be all 1rsquos If not bit error (correction next)
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Principles of Reliable data transferbull important in app transport link layers
bull top-10 list of important networking topics
bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
Principles of Reliable data transferbull important in app transport link layers
bull top-10 list of important networking topics
bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
Principles of Reliable Data Transferbull important in app transport link layersbull top-10 list of important networking topics
bull Characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
(Zoom
next
slid
e)
Reliable Data Transfer Getting Started
sendside
receiveside
rdt_send() called from above (eg by app) Passed data to deliver to receiver upper layer
udt_send() called by rdtto transfer packet over unreliable channel to receiver
rdt_rcv() called when packet arrives on rcv-side of channel
deliver_data() called by rdt to deliver data to upper
Reliable Data Transfer Getting Started
Wersquollbull Incrementally develop sender receiver
sides of reliable data transfer protocol (rdt)bull Consider only unidirectional data transfer
ndash but control info will flow on both directions
bull Use finite state machines (FSM) to specify sender receiver
state 1 state 2
event causing state transitionactions taken on state transition
state when in this ldquostaterdquo next state uniquely determined by next
event
eventactions
Rdt10 Reliable Transfer over a Reliable Channel
bull Underlying channel perfectly reliablendash no bit errorsndash no loss of packets
bull Separate FSMs for sender receiverndash sender sends data into underlying channelndash receiver read data from underlying channel
Wait for call from above packet = make_pkt(data)
udt_send(packet)
rdt_send(data)
extract (packetdata)deliver_data(data)
Wait for call from
below
rdt_rcv(packet)
sender receiverEasy
What if Taking a Message over Phone
bull Message is clearbull Message is garbled
What if Taking a Message over Phone
bull Message is clearndash Ok
bull Message is garbledndash Ask to repeatndash May not need whole message
bull In networks called Automatic Repeat reQuest (ARQ)ndash Need error detectionndash Receiver feedbackndash Retransmission
Rdt20 Channel with Bit Errors (no Loss)
bull Underlying channel may flip bits in packetndash Checksum to detect bit errors
bull The question how to recover from errorsndash acknowledgements (ACKs) receiver explicitly tells
sender that pkt received OKndash negative acknowledgements (NAKs) receiver
explicitly tells sender that pkt had errorsndash Sender retransmits pkt on receipt of NAK
bull New mechanisms in rdt20 (beyond rdt10)ndash Error detectionndash Receiver feedback control msgs (ACKNAK)
rcvrsender
Rdt20 FSM Specification
Wait for call from above
sndpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from below
sender
receiverrdt_send(data)
L
Rdt20 Operation with No Errors
Wait for call from above
snkpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from
below
rdt_send(data)
L
sender
receiver
Rdt20 Error Scenario
Wait for call from above
snkpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from
below
rdt_send(data)
L
sender
receiver
Sender sends one packet then waits for receiver response
stop and wait
Rdt20 Has a Fatal Flaw
Wait for call from above
sndpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from below
sender
receiverrdt_send(data)
L
Rdt20 Has a Fatal Flaw
What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicatebull How to handle duplicates
Rdt20 Has a Fatal Flaw
What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicate
Handling duplicates bull Sender retransmits
current pkt if ACKNAK garbled
bull Sender adds sequence number to each pktndash Can use 1 bit (for now)
bull receiver discards (doesnrsquot deliver up) duplicate pkt
Rdt21 Sender Handles Garbled ACKNAKs
Wait for call 0 from
above
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)
rdt_send(data)
Wait for ACK or NAK 0
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) ||isNAK(rcvpkt) )
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)
rdt_send(data)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)
Wait for call 1 from
above
Wait for ACK or NAK 1
LL
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)
Rdt21 Receiver Handles Garbled ACKNAKs
Wait for 0 from below
sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq0(rcvpkt)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
ampamp has_seq1(rcvpkt) extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
Wait for 1 from below
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq0(rcvpkt)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq1(rcvpkt)
sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
Rdt21 Discussion
Senderbull seq added to pktbull two seq rsquos (01) will sufficebull must check if received ACKNAK corrupted bull twice as many states
ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq
Receiverbull must check if received packet is duplicate
ndash state indicates whether 0 or 1 is expected pkt seq
bull note receiver can not know if its last ACKNAK received OK at sender
Rdt22 a NAK-free Protocol
bull Reduce type of response ACK onlybull Same functionality as rdt21 using ACKs onlybull Instead of NAK receiver sends ACK for last
pkt received OKndash receiver must explicitly include seq of pkt being
ACKed
bull Duplicate ACK at sender results in same action as NAK retransmit current pkt
rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))
udt_send(sndpkt)
Rdt22 Sender amp Receiver Fragments
Wait for call 0 from
above
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)
rdt_send(data)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
Wait for ACK 0
sender FSMfragment
Wait for 0 from below
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)
receiver FSMfragment
L
Rdt30 Channels with Errors and Loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of
help but not enough
How to determine if a packet is lost
Rdt30 Channels with Errors and Loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of
help but not enough
Approach sender waits ldquoreasonablerdquo amount of time for ACK
bull Retransmits if no ACK received in this time
bull If pkt (or ACK) just delayed (not lost)ndash Retransmission will be
duplicate but use of seq rsquos already handles this
ndash Receiver must specify seq of pkt being ACKed
bull Requires countdown timer
Rdt30 Sender
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer
rdt_send(data)
Wait for
ACK0
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )
Wait for call 1 from
above
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer
rdt_send(data)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)
stop_timerstop_timer
udt_send(sndpkt)start_timer
timeout
udt_send(sndpkt)start_timer
timeout
rdt_rcv(rcvpkt)
Wait for call 0from
above
Wait for
ACK1
Lrdt_rcv(rcvpkt)
LL
L
Rdt30 in Action
Rdt30 in Action
Performance of Rdt30
bull Rdt30 works but performance stinkshellipbull ex 1 Gbps link 15 ms prop delay 8000 bit
packet
U sender utilization ndash fraction of time sender busy sending
U sender
= 008
30008 = 000027
microseconds
L R
RTT + L R =
bull 1KB pkt every 30 msec -gt 33kBsec throughput over 1 Gbps link
bull Network protocol limits use of physical resources
dsmicrosecon8bps10
bits80009
R
Ldtrans
(Picture next slide)
Rdt30 Stop-and-Wait Operation
first packet bit transmitted t = 0
sender
receiver
RTT
last packet bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
U sender
= 008
30008 = 000027
microseconds
L R
RTT + L R =
Pipelined Protocols
Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash Range of sequence numbers must be increasedndash Need buffering at sender andor receiver
Pipelining Increased Utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
U sender
= 024
30008 = 00008
microseconds
3 L R
RTT + L R =
Increase utilizationby a factor of 3
bull Two generic forms of pipelined protocols go-Back-N selective repeat
Pipelining Protocols
Go-back-N overviewbull sender up to N
unACKed pkts in pipeline
bull receiver only sends cumulative ACKsndash doesnrsquot ACK pkt if
therersquos a gap
bull sender has timer for oldest unACKed pktndash if timer expires
retransmit all unACKed packets
Selective Repeat overviewbull sender up to N
unACKed packets in pipeline
bull receiver ACKs individual pkts
bull sender maintains timer for each unACKed pktndash if timer expires
retransmit only unACKed packet
Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unACKed pkts allowed
bull ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)
bull Timer for each in-flight pktbull Timeout(n) retransmit pkt n and all higher seq pkts in
window
GBN Sender Extended FSM
Waitstart_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])
timeout
rdt_send(data)
if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)
base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
base=1nextseqnum=1
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
L
GBN Receiver Extended FSM
ACK-only always send ACK for correctly-received pkt with highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum
bull out-of-order pkt ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK pkt with highest in-order seq
Wait
udt_send(sndpkt)
default
rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++
expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)
L
GBN inaction
GBN Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsgo-back-nindexhtml
Selective Repeatbull Receiver individually acknowledges all
correctly received pktsndash Buffers pkts as needed for eventual in-order
delivery to upper layer
bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt
bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts
Selective Repeat sender receiver windows
Selective Repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next
unACKed seq
senderpkt n in [rcvbase rcvbase+N-
1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also
deliver buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1]
bull ACK(n)
otherwise bull ignore
receiver
Transport Layer 3-52
Selective Repeat in Action
Selective Repeat
DilemmaExample bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
SR Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Overview RFCs 793 1122 1323 2018 2581
bull full duplex datandash bi-directional data flow
in same connectionndash MSS maximum
segment size
bull connection-oriented ndash handshaking
(exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not
overwhelm receiver
bull point-to-pointndash one sender one
receiver
bull reliable in-order byte steamndash no ldquomessage
boundariesrdquo
bull pipelinedndash TCP congestion and
flow control set window size
bull send amp receive buffers
socketdoor
T C Psend buffer
T C Preceive buffer
socketdoor
segm ent
applicationwrites data
applicationreads data
TCP Segment Structure
source port dest port
32 bits
applicationdata (variable length)
sequence number
acknowledgement number
Receive window
Urg data pointerchecksumFSRPAU
headlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
TCP Seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next
byte expected from other side
ndash cumulative ACKQ how receiver
handles out-of-order segmentsndash A TCP spec
doesnrsquot say up to implementer
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
UsertypeslsquoCrsquo
host ACKsreceipt of echoedlsquoCrsquo
timesimple telnet scenario
host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTT
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTTbull SampleRTT measured time
from segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
bull Exponential weighted moving averagebull influence of past sample decreases exponentially
fastbull typical value = 18th (or 0125)
Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety margin
bull First estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP reliable data transfer
bull TCP creates rdt service on top of IPrsquos unreliable service
bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs
bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control
TCP Sender EventsData rcvd from appbull Create segment with
seq bull seq is byte-stream
number of first data byte in segment
bull Start timer if not already running (think of timer as for oldest unACKed segment)
bull Expiration interval TimeOutInterval
Timeoutbull retransmit segment
that caused timeoutbull restart timer ACK rcvdbull If acknowledges
previously unACKed segmentsndash update what is known
to be ACKedndash start timer if there are
outstanding segments
TCP Sender
(simplified)
NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71
y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed
TCP Retransmission Scenarios
Host A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq
=92
tim
eout
ACK=120
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq
=92
tim
eout
SendBase= 100
SendBase= 120
SendBase= 120
Sendbase= 100
TCP Retransmission Scenarios (more)
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
Cumulative ACK scenario
Host B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
Fast Retransmit
bull Time-out period often relatively longndash Long delay before
resending lost packet
bull Detect lost segments via duplicate ACKsndash Sender often sends
many segments back-to-back
ndash If segment lost there will likely be many duplicate ACKs for that segment
bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
Host A
timeo
ut
Host B
time
X
resend seq X2
seq x1seq x2seq x3seq x4seq x5
ACK x1
ACK x1ACK x1ACK x1
tripleduplicate
ACKs
Fast
Retra
nsm
it
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Flow Control
bull Receive side of TCP connection has a receive buffer
bull speed-matching service matching send rate to receiving applicationrsquos drain rate
bull App process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast
flow control
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
TCP Flow Control How it Works
(suppose TCP receiver discards out-of-order segments)
bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Receiver advertises unused buffer space by including rwnd value in segment header
bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos
buffer doesnrsquot overflow
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
rwndRcvBuffer
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control
info (eg RcvWindow)bull client connection
initiator Socket clientSocket = new
Socket(ldquohostnamerdquo port) bull server contacted by
client Socket connectionSocket =
welcomeSocketaccept()
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial
seq Step 3 client receives
SYNACK replies with ACK segment which may contain data
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer 3-81
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Principles of Congestion Control
Congestionbull Informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull Different from flow controlbull Manifestations
ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)
bull A ldquotop-10rdquo problem
Causescosts of Congestion Scenario 1
bull Two senders two receivers
bull One router infinite buffers
bull No retransmission
bull Large delays when congested
bull Maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
Causescosts of Congestion Scenario 2
bull One router finite buffers bull Sender retransmission of lost packet
finite shared output link
buffers
Host A
lin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of congestion Scenario 2
bull Always (goodput)
bull ldquoPerfectrdquo retransmission only when loss
bull Retransmission of delayed (not lost) packet makes
larger (than perfect case) for same
lin
lout
=
lin
lout
gtl
inl
out
ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of
pkt
R2
R2lin
l ou
t
b
R2
R2lin
l ou
t
a
R2
R2lin
l ou
t
c
R4
R3
Causescosts of Congestion Scenario 3
bull Four sendersbull Multihop pathsbull Timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of Congestion Scenario 3
Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Host A
Host B
lo
u
t
Approaches towards congestion control
End-end congestion control
bull No explicit feedback from network
bull Congestion inferred from end-system observed loss delay
bull Approach taken by TCP
Network-assisted congestion control
bull Routers provide feedback to end systemsndash Single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash Explicit rate sender should send at
Broadly
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Congestion Control
bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level
bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)
network not congested so increase sending rate
- lost segment assume loss due to congested network so decrease sending rate
TCP Congestion Control Bandwidth Probing
bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since
available bandwidth is changing depending on other connections in network)
ACKs being received so increase rate
X
X
XX
X loss so decrease rate
send
ing
rate
time
bull Q how fast to increasedecrease- details to follow
TCPrsquosldquosawtoothrdquobehavior
TCP Congestion Control detailsbull sender limits rate by limiting
number of unACKed bytes ldquoin pipelinerdquo
ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)
bull roughly
bull cwnd is dynamic function of perceived network congestion
rate = cwnd
RTT bytessec
LastByteSent-LastByteAcked cwnd
cwndbytes
RTT
ACK(s)
TCP Congestion Control more details
segment loss event reducing cwnd
bull timeout no response from receiverndash cut cwnd to 1
bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less
aggressively than on timeout
ACK received increase cwnd
bull slowstart phase - increase exponentially
fast (despite name) at connection start or following timeout
bull congestion avoidance - increase linearly
TCP Slow Start
bull when connection begins cwnd = 1 MSSndash example MSS = 500
bytes amp RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp
up to respectable ratebull increase rate exponentially
until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transitioning intoout of slowstart
ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2
ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to
congestion avoidance phase
slow start timeout
ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed
new ACKdupACKcount++
duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion
avoidance
TCP Congestion Avoidance
bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1
MSS per RTT ndash approach possible
congestion slower than in slowstart
ndash implementation cwnd = cwnd + MSScwnd for each ACK received
bull ACKs increase cwnd by 1 MSS per RTT additive increase
bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease
AIMD
AIMD Additive IncreaseMultiplicative Decrease
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
Internet Checksum Example
bull Example add two 16-bit integers
1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1
wraparound
sum
checksum
bull At receiver add 2 integers and checksum hellip should be all 1rsquos If not bit error (correction next)
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Principles of Reliable data transferbull important in app transport link layers
bull top-10 list of important networking topics
bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
Principles of Reliable data transferbull important in app transport link layers
bull top-10 list of important networking topics
bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
Principles of Reliable Data Transferbull important in app transport link layersbull top-10 list of important networking topics
bull Characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
(Zoom
next
slid
e)
Reliable Data Transfer Getting Started
sendside
receiveside
rdt_send() called from above (eg by app) Passed data to deliver to receiver upper layer
udt_send() called by rdtto transfer packet over unreliable channel to receiver
rdt_rcv() called when packet arrives on rcv-side of channel
deliver_data() called by rdt to deliver data to upper
Reliable Data Transfer Getting Started
Wersquollbull Incrementally develop sender receiver
sides of reliable data transfer protocol (rdt)bull Consider only unidirectional data transfer
ndash but control info will flow on both directions
bull Use finite state machines (FSM) to specify sender receiver
state 1 state 2
event causing state transitionactions taken on state transition
state when in this ldquostaterdquo next state uniquely determined by next
event
eventactions
Rdt10 Reliable Transfer over a Reliable Channel
bull Underlying channel perfectly reliablendash no bit errorsndash no loss of packets
bull Separate FSMs for sender receiverndash sender sends data into underlying channelndash receiver read data from underlying channel
Wait for call from above packet = make_pkt(data)
udt_send(packet)
rdt_send(data)
extract (packetdata)deliver_data(data)
Wait for call from
below
rdt_rcv(packet)
sender receiverEasy
What if Taking a Message over Phone
bull Message is clearbull Message is garbled
What if Taking a Message over Phone
bull Message is clearndash Ok
bull Message is garbledndash Ask to repeatndash May not need whole message
bull In networks called Automatic Repeat reQuest (ARQ)ndash Need error detectionndash Receiver feedbackndash Retransmission
Rdt20 Channel with Bit Errors (no Loss)
bull Underlying channel may flip bits in packetndash Checksum to detect bit errors
bull The question how to recover from errorsndash acknowledgements (ACKs) receiver explicitly tells
sender that pkt received OKndash negative acknowledgements (NAKs) receiver
explicitly tells sender that pkt had errorsndash Sender retransmits pkt on receipt of NAK
bull New mechanisms in rdt20 (beyond rdt10)ndash Error detectionndash Receiver feedback control msgs (ACKNAK)
rcvrsender
Rdt20 FSM Specification
Wait for call from above
sndpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from below
sender
receiverrdt_send(data)
L
Rdt20 Operation with No Errors
Wait for call from above
snkpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from
below
rdt_send(data)
L
sender
receiver
Rdt20 Error Scenario
Wait for call from above
snkpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from
below
rdt_send(data)
L
sender
receiver
Sender sends one packet then waits for receiver response
stop and wait
Rdt20 Has a Fatal Flaw
Wait for call from above
sndpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from below
sender
receiverrdt_send(data)
L
Rdt20 Has a Fatal Flaw
What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicatebull How to handle duplicates
Rdt20 Has a Fatal Flaw
What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicate
Handling duplicates bull Sender retransmits
current pkt if ACKNAK garbled
bull Sender adds sequence number to each pktndash Can use 1 bit (for now)
bull receiver discards (doesnrsquot deliver up) duplicate pkt
Rdt21 Sender Handles Garbled ACKNAKs
Wait for call 0 from
above
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)
rdt_send(data)
Wait for ACK or NAK 0
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) ||isNAK(rcvpkt) )
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)
rdt_send(data)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)
Wait for call 1 from
above
Wait for ACK or NAK 1
LL
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)
Rdt21 Receiver Handles Garbled ACKNAKs
Wait for 0 from below
sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq0(rcvpkt)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
ampamp has_seq1(rcvpkt) extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
Wait for 1 from below
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq0(rcvpkt)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq1(rcvpkt)
sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
Rdt21 Discussion
Senderbull seq added to pktbull two seq rsquos (01) will sufficebull must check if received ACKNAK corrupted bull twice as many states
ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq
Receiverbull must check if received packet is duplicate
ndash state indicates whether 0 or 1 is expected pkt seq
bull note receiver can not know if its last ACKNAK received OK at sender
Rdt22 a NAK-free Protocol
bull Reduce type of response ACK onlybull Same functionality as rdt21 using ACKs onlybull Instead of NAK receiver sends ACK for last
pkt received OKndash receiver must explicitly include seq of pkt being
ACKed
bull Duplicate ACK at sender results in same action as NAK retransmit current pkt
rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))
udt_send(sndpkt)
Rdt22 Sender amp Receiver Fragments
Wait for call 0 from
above
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)
rdt_send(data)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
Wait for ACK 0
sender FSMfragment
Wait for 0 from below
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)
receiver FSMfragment
L
Rdt30 Channels with Errors and Loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of
help but not enough
How to determine if a packet is lost
Rdt30 Channels with Errors and Loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of
help but not enough
Approach sender waits ldquoreasonablerdquo amount of time for ACK
bull Retransmits if no ACK received in this time
bull If pkt (or ACK) just delayed (not lost)ndash Retransmission will be
duplicate but use of seq rsquos already handles this
ndash Receiver must specify seq of pkt being ACKed
bull Requires countdown timer
Rdt30 Sender
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer
rdt_send(data)
Wait for
ACK0
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )
Wait for call 1 from
above
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer
rdt_send(data)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)
stop_timerstop_timer
udt_send(sndpkt)start_timer
timeout
udt_send(sndpkt)start_timer
timeout
rdt_rcv(rcvpkt)
Wait for call 0from
above
Wait for
ACK1
Lrdt_rcv(rcvpkt)
LL
L
Rdt30 in Action
Rdt30 in Action
Performance of Rdt30
bull Rdt30 works but performance stinkshellipbull ex 1 Gbps link 15 ms prop delay 8000 bit
packet
U sender utilization ndash fraction of time sender busy sending
U sender
= 008
30008 = 000027
microseconds
L R
RTT + L R =
bull 1KB pkt every 30 msec -gt 33kBsec throughput over 1 Gbps link
bull Network protocol limits use of physical resources
dsmicrosecon8bps10
bits80009
R
Ldtrans
(Picture next slide)
Rdt30 Stop-and-Wait Operation
first packet bit transmitted t = 0
sender
receiver
RTT
last packet bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
U sender
= 008
30008 = 000027
microseconds
L R
RTT + L R =
Pipelined Protocols
Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash Range of sequence numbers must be increasedndash Need buffering at sender andor receiver
Pipelining Increased Utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
U sender
= 024
30008 = 00008
microseconds
3 L R
RTT + L R =
Increase utilizationby a factor of 3
bull Two generic forms of pipelined protocols go-Back-N selective repeat
Pipelining Protocols
Go-back-N overviewbull sender up to N
unACKed pkts in pipeline
bull receiver only sends cumulative ACKsndash doesnrsquot ACK pkt if
therersquos a gap
bull sender has timer for oldest unACKed pktndash if timer expires
retransmit all unACKed packets
Selective Repeat overviewbull sender up to N
unACKed packets in pipeline
bull receiver ACKs individual pkts
bull sender maintains timer for each unACKed pktndash if timer expires
retransmit only unACKed packet
Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unACKed pkts allowed
bull ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)
bull Timer for each in-flight pktbull Timeout(n) retransmit pkt n and all higher seq pkts in
window
GBN Sender Extended FSM
Waitstart_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])
timeout
rdt_send(data)
if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)
base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
base=1nextseqnum=1
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
L
GBN Receiver Extended FSM
ACK-only always send ACK for correctly-received pkt with highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum
bull out-of-order pkt ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK pkt with highest in-order seq
Wait
udt_send(sndpkt)
default
rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++
expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)
L
GBN inaction
GBN Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsgo-back-nindexhtml
Selective Repeatbull Receiver individually acknowledges all
correctly received pktsndash Buffers pkts as needed for eventual in-order
delivery to upper layer
bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt
bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts
Selective Repeat sender receiver windows
Selective Repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next
unACKed seq
senderpkt n in [rcvbase rcvbase+N-
1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also
deliver buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1]
bull ACK(n)
otherwise bull ignore
receiver
Transport Layer 3-52
Selective Repeat in Action
Selective Repeat
DilemmaExample bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
SR Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Overview RFCs 793 1122 1323 2018 2581
bull full duplex datandash bi-directional data flow
in same connectionndash MSS maximum
segment size
bull connection-oriented ndash handshaking
(exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not
overwhelm receiver
bull point-to-pointndash one sender one
receiver
bull reliable in-order byte steamndash no ldquomessage
boundariesrdquo
bull pipelinedndash TCP congestion and
flow control set window size
bull send amp receive buffers
socketdoor
T C Psend buffer
T C Preceive buffer
socketdoor
segm ent
applicationwrites data
applicationreads data
TCP Segment Structure
source port dest port
32 bits
applicationdata (variable length)
sequence number
acknowledgement number
Receive window
Urg data pointerchecksumFSRPAU
headlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
TCP Seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next
byte expected from other side
ndash cumulative ACKQ how receiver
handles out-of-order segmentsndash A TCP spec
doesnrsquot say up to implementer
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
UsertypeslsquoCrsquo
host ACKsreceipt of echoedlsquoCrsquo
timesimple telnet scenario
host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTT
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTTbull SampleRTT measured time
from segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
bull Exponential weighted moving averagebull influence of past sample decreases exponentially
fastbull typical value = 18th (or 0125)
Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety margin
bull First estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP reliable data transfer
bull TCP creates rdt service on top of IPrsquos unreliable service
bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs
bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control
TCP Sender EventsData rcvd from appbull Create segment with
seq bull seq is byte-stream
number of first data byte in segment
bull Start timer if not already running (think of timer as for oldest unACKed segment)
bull Expiration interval TimeOutInterval
Timeoutbull retransmit segment
that caused timeoutbull restart timer ACK rcvdbull If acknowledges
previously unACKed segmentsndash update what is known
to be ACKedndash start timer if there are
outstanding segments
TCP Sender
(simplified)
NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71
y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed
TCP Retransmission Scenarios
Host A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq
=92
tim
eout
ACK=120
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq
=92
tim
eout
SendBase= 100
SendBase= 120
SendBase= 120
Sendbase= 100
TCP Retransmission Scenarios (more)
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
Cumulative ACK scenario
Host B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
Fast Retransmit
bull Time-out period often relatively longndash Long delay before
resending lost packet
bull Detect lost segments via duplicate ACKsndash Sender often sends
many segments back-to-back
ndash If segment lost there will likely be many duplicate ACKs for that segment
bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
Host A
timeo
ut
Host B
time
X
resend seq X2
seq x1seq x2seq x3seq x4seq x5
ACK x1
ACK x1ACK x1ACK x1
tripleduplicate
ACKs
Fast
Retra
nsm
it
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Flow Control
bull Receive side of TCP connection has a receive buffer
bull speed-matching service matching send rate to receiving applicationrsquos drain rate
bull App process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast
flow control
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
TCP Flow Control How it Works
(suppose TCP receiver discards out-of-order segments)
bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Receiver advertises unused buffer space by including rwnd value in segment header
bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos
buffer doesnrsquot overflow
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
rwndRcvBuffer
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control
info (eg RcvWindow)bull client connection
initiator Socket clientSocket = new
Socket(ldquohostnamerdquo port) bull server contacted by
client Socket connectionSocket =
welcomeSocketaccept()
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial
seq Step 3 client receives
SYNACK replies with ACK segment which may contain data
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer 3-81
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Principles of Congestion Control
Congestionbull Informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull Different from flow controlbull Manifestations
ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)
bull A ldquotop-10rdquo problem
Causescosts of Congestion Scenario 1
bull Two senders two receivers
bull One router infinite buffers
bull No retransmission
bull Large delays when congested
bull Maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
Causescosts of Congestion Scenario 2
bull One router finite buffers bull Sender retransmission of lost packet
finite shared output link
buffers
Host A
lin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of congestion Scenario 2
bull Always (goodput)
bull ldquoPerfectrdquo retransmission only when loss
bull Retransmission of delayed (not lost) packet makes
larger (than perfect case) for same
lin
lout
=
lin
lout
gtl
inl
out
ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of
pkt
R2
R2lin
l ou
t
b
R2
R2lin
l ou
t
a
R2
R2lin
l ou
t
c
R4
R3
Causescosts of Congestion Scenario 3
bull Four sendersbull Multihop pathsbull Timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of Congestion Scenario 3
Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Host A
Host B
lo
u
t
Approaches towards congestion control
End-end congestion control
bull No explicit feedback from network
bull Congestion inferred from end-system observed loss delay
bull Approach taken by TCP
Network-assisted congestion control
bull Routers provide feedback to end systemsndash Single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash Explicit rate sender should send at
Broadly
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Congestion Control
bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level
bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)
network not congested so increase sending rate
- lost segment assume loss due to congested network so decrease sending rate
TCP Congestion Control Bandwidth Probing
bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since
available bandwidth is changing depending on other connections in network)
ACKs being received so increase rate
X
X
XX
X loss so decrease rate
send
ing
rate
time
bull Q how fast to increasedecrease- details to follow
TCPrsquosldquosawtoothrdquobehavior
TCP Congestion Control detailsbull sender limits rate by limiting
number of unACKed bytes ldquoin pipelinerdquo
ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)
bull roughly
bull cwnd is dynamic function of perceived network congestion
rate = cwnd
RTT bytessec
LastByteSent-LastByteAcked cwnd
cwndbytes
RTT
ACK(s)
TCP Congestion Control more details
segment loss event reducing cwnd
bull timeout no response from receiverndash cut cwnd to 1
bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less
aggressively than on timeout
ACK received increase cwnd
bull slowstart phase - increase exponentially
fast (despite name) at connection start or following timeout
bull congestion avoidance - increase linearly
TCP Slow Start
bull when connection begins cwnd = 1 MSSndash example MSS = 500
bytes amp RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp
up to respectable ratebull increase rate exponentially
until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transitioning intoout of slowstart
ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2
ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to
congestion avoidance phase
slow start timeout
ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed
new ACKdupACKcount++
duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion
avoidance
TCP Congestion Avoidance
bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1
MSS per RTT ndash approach possible
congestion slower than in slowstart
ndash implementation cwnd = cwnd + MSScwnd for each ACK received
bull ACKs increase cwnd by 1 MSS per RTT additive increase
bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease
AIMD
AIMD Additive IncreaseMultiplicative Decrease
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Principles of Reliable data transferbull important in app transport link layers
bull top-10 list of important networking topics
bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
Principles of Reliable data transferbull important in app transport link layers
bull top-10 list of important networking topics
bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
Principles of Reliable Data Transferbull important in app transport link layersbull top-10 list of important networking topics
bull Characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
(Zoom
next
slid
e)
Reliable Data Transfer Getting Started
sendside
receiveside
rdt_send() called from above (eg by app) Passed data to deliver to receiver upper layer
udt_send() called by rdtto transfer packet over unreliable channel to receiver
rdt_rcv() called when packet arrives on rcv-side of channel
deliver_data() called by rdt to deliver data to upper
Reliable Data Transfer Getting Started
Wersquollbull Incrementally develop sender receiver
sides of reliable data transfer protocol (rdt)bull Consider only unidirectional data transfer
ndash but control info will flow on both directions
bull Use finite state machines (FSM) to specify sender receiver
state 1 state 2
event causing state transitionactions taken on state transition
state when in this ldquostaterdquo next state uniquely determined by next
event
eventactions
Rdt10 Reliable Transfer over a Reliable Channel
bull Underlying channel perfectly reliablendash no bit errorsndash no loss of packets
bull Separate FSMs for sender receiverndash sender sends data into underlying channelndash receiver read data from underlying channel
Wait for call from above packet = make_pkt(data)
udt_send(packet)
rdt_send(data)
extract (packetdata)deliver_data(data)
Wait for call from
below
rdt_rcv(packet)
sender receiverEasy
What if Taking a Message over Phone
bull Message is clearbull Message is garbled
What if Taking a Message over Phone
bull Message is clearndash Ok
bull Message is garbledndash Ask to repeatndash May not need whole message
bull In networks called Automatic Repeat reQuest (ARQ)ndash Need error detectionndash Receiver feedbackndash Retransmission
Rdt20 Channel with Bit Errors (no Loss)
bull Underlying channel may flip bits in packetndash Checksum to detect bit errors
bull The question how to recover from errorsndash acknowledgements (ACKs) receiver explicitly tells
sender that pkt received OKndash negative acknowledgements (NAKs) receiver
explicitly tells sender that pkt had errorsndash Sender retransmits pkt on receipt of NAK
bull New mechanisms in rdt20 (beyond rdt10)ndash Error detectionndash Receiver feedback control msgs (ACKNAK)
rcvrsender
Rdt20 FSM Specification
Wait for call from above
sndpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from below
sender
receiverrdt_send(data)
L
Rdt20 Operation with No Errors
Wait for call from above
snkpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from
below
rdt_send(data)
L
sender
receiver
Rdt20 Error Scenario
Wait for call from above
snkpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from
below
rdt_send(data)
L
sender
receiver
Sender sends one packet then waits for receiver response
stop and wait
Rdt20 Has a Fatal Flaw
Wait for call from above
sndpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from below
sender
receiverrdt_send(data)
L
Rdt20 Has a Fatal Flaw
What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicatebull How to handle duplicates
Rdt20 Has a Fatal Flaw
What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicate
Handling duplicates bull Sender retransmits
current pkt if ACKNAK garbled
bull Sender adds sequence number to each pktndash Can use 1 bit (for now)
bull receiver discards (doesnrsquot deliver up) duplicate pkt
Rdt21 Sender Handles Garbled ACKNAKs
Wait for call 0 from
above
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)
rdt_send(data)
Wait for ACK or NAK 0
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) ||isNAK(rcvpkt) )
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)
rdt_send(data)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)
Wait for call 1 from
above
Wait for ACK or NAK 1
LL
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)
Rdt21 Receiver Handles Garbled ACKNAKs
Wait for 0 from below
sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq0(rcvpkt)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
ampamp has_seq1(rcvpkt) extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
Wait for 1 from below
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq0(rcvpkt)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq1(rcvpkt)
sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
Rdt21 Discussion
Senderbull seq added to pktbull two seq rsquos (01) will sufficebull must check if received ACKNAK corrupted bull twice as many states
ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq
Receiverbull must check if received packet is duplicate
ndash state indicates whether 0 or 1 is expected pkt seq
bull note receiver can not know if its last ACKNAK received OK at sender
Rdt22 a NAK-free Protocol
bull Reduce type of response ACK onlybull Same functionality as rdt21 using ACKs onlybull Instead of NAK receiver sends ACK for last
pkt received OKndash receiver must explicitly include seq of pkt being
ACKed
bull Duplicate ACK at sender results in same action as NAK retransmit current pkt
rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))
udt_send(sndpkt)
Rdt22 Sender amp Receiver Fragments
Wait for call 0 from
above
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)
rdt_send(data)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
Wait for ACK 0
sender FSMfragment
Wait for 0 from below
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)
receiver FSMfragment
L
Rdt30 Channels with Errors and Loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of
help but not enough
How to determine if a packet is lost
Rdt30 Channels with Errors and Loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of
help but not enough
Approach sender waits ldquoreasonablerdquo amount of time for ACK
bull Retransmits if no ACK received in this time
bull If pkt (or ACK) just delayed (not lost)ndash Retransmission will be
duplicate but use of seq rsquos already handles this
ndash Receiver must specify seq of pkt being ACKed
bull Requires countdown timer
Rdt30 Sender
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer
rdt_send(data)
Wait for
ACK0
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )
Wait for call 1 from
above
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer
rdt_send(data)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)
stop_timerstop_timer
udt_send(sndpkt)start_timer
timeout
udt_send(sndpkt)start_timer
timeout
rdt_rcv(rcvpkt)
Wait for call 0from
above
Wait for
ACK1
Lrdt_rcv(rcvpkt)
LL
L
Rdt30 in Action
Rdt30 in Action
Performance of Rdt30
bull Rdt30 works but performance stinkshellipbull ex 1 Gbps link 15 ms prop delay 8000 bit
packet
U sender utilization ndash fraction of time sender busy sending
U sender
= 008
30008 = 000027
microseconds
L R
RTT + L R =
bull 1KB pkt every 30 msec -gt 33kBsec throughput over 1 Gbps link
bull Network protocol limits use of physical resources
dsmicrosecon8bps10
bits80009
R
Ldtrans
(Picture next slide)
Rdt30 Stop-and-Wait Operation
first packet bit transmitted t = 0
sender
receiver
RTT
last packet bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
U sender
= 008
30008 = 000027
microseconds
L R
RTT + L R =
Pipelined Protocols
Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash Range of sequence numbers must be increasedndash Need buffering at sender andor receiver
Pipelining Increased Utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
U sender
= 024
30008 = 00008
microseconds
3 L R
RTT + L R =
Increase utilizationby a factor of 3
bull Two generic forms of pipelined protocols go-Back-N selective repeat
Pipelining Protocols
Go-back-N overviewbull sender up to N
unACKed pkts in pipeline
bull receiver only sends cumulative ACKsndash doesnrsquot ACK pkt if
therersquos a gap
bull sender has timer for oldest unACKed pktndash if timer expires
retransmit all unACKed packets
Selective Repeat overviewbull sender up to N
unACKed packets in pipeline
bull receiver ACKs individual pkts
bull sender maintains timer for each unACKed pktndash if timer expires
retransmit only unACKed packet
Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unACKed pkts allowed
bull ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)
bull Timer for each in-flight pktbull Timeout(n) retransmit pkt n and all higher seq pkts in
window
GBN Sender Extended FSM
Waitstart_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])
timeout
rdt_send(data)
if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)
base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
base=1nextseqnum=1
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
L
GBN Receiver Extended FSM
ACK-only always send ACK for correctly-received pkt with highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum
bull out-of-order pkt ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK pkt with highest in-order seq
Wait
udt_send(sndpkt)
default
rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++
expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)
L
GBN inaction
GBN Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsgo-back-nindexhtml
Selective Repeatbull Receiver individually acknowledges all
correctly received pktsndash Buffers pkts as needed for eventual in-order
delivery to upper layer
bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt
bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts
Selective Repeat sender receiver windows
Selective Repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next
unACKed seq
senderpkt n in [rcvbase rcvbase+N-
1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also
deliver buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1]
bull ACK(n)
otherwise bull ignore
receiver
Transport Layer 3-52
Selective Repeat in Action
Selective Repeat
DilemmaExample bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
SR Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Overview RFCs 793 1122 1323 2018 2581
bull full duplex datandash bi-directional data flow
in same connectionndash MSS maximum
segment size
bull connection-oriented ndash handshaking
(exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not
overwhelm receiver
bull point-to-pointndash one sender one
receiver
bull reliable in-order byte steamndash no ldquomessage
boundariesrdquo
bull pipelinedndash TCP congestion and
flow control set window size
bull send amp receive buffers
socketdoor
T C Psend buffer
T C Preceive buffer
socketdoor
segm ent
applicationwrites data
applicationreads data
TCP Segment Structure
source port dest port
32 bits
applicationdata (variable length)
sequence number
acknowledgement number
Receive window
Urg data pointerchecksumFSRPAU
headlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
TCP Seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next
byte expected from other side
ndash cumulative ACKQ how receiver
handles out-of-order segmentsndash A TCP spec
doesnrsquot say up to implementer
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
UsertypeslsquoCrsquo
host ACKsreceipt of echoedlsquoCrsquo
timesimple telnet scenario
host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTT
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTTbull SampleRTT measured time
from segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
bull Exponential weighted moving averagebull influence of past sample decreases exponentially
fastbull typical value = 18th (or 0125)
Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety margin
bull First estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP reliable data transfer
bull TCP creates rdt service on top of IPrsquos unreliable service
bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs
bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control
TCP Sender EventsData rcvd from appbull Create segment with
seq bull seq is byte-stream
number of first data byte in segment
bull Start timer if not already running (think of timer as for oldest unACKed segment)
bull Expiration interval TimeOutInterval
Timeoutbull retransmit segment
that caused timeoutbull restart timer ACK rcvdbull If acknowledges
previously unACKed segmentsndash update what is known
to be ACKedndash start timer if there are
outstanding segments
TCP Sender
(simplified)
NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71
y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed
TCP Retransmission Scenarios
Host A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq
=92
tim
eout
ACK=120
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq
=92
tim
eout
SendBase= 100
SendBase= 120
SendBase= 120
Sendbase= 100
TCP Retransmission Scenarios (more)
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
Cumulative ACK scenario
Host B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
Fast Retransmit
bull Time-out period often relatively longndash Long delay before
resending lost packet
bull Detect lost segments via duplicate ACKsndash Sender often sends
many segments back-to-back
ndash If segment lost there will likely be many duplicate ACKs for that segment
bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
Host A
timeo
ut
Host B
time
X
resend seq X2
seq x1seq x2seq x3seq x4seq x5
ACK x1
ACK x1ACK x1ACK x1
tripleduplicate
ACKs
Fast
Retra
nsm
it
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Flow Control
bull Receive side of TCP connection has a receive buffer
bull speed-matching service matching send rate to receiving applicationrsquos drain rate
bull App process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast
flow control
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
TCP Flow Control How it Works
(suppose TCP receiver discards out-of-order segments)
bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Receiver advertises unused buffer space by including rwnd value in segment header
bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos
buffer doesnrsquot overflow
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
rwndRcvBuffer
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control
info (eg RcvWindow)bull client connection
initiator Socket clientSocket = new
Socket(ldquohostnamerdquo port) bull server contacted by
client Socket connectionSocket =
welcomeSocketaccept()
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial
seq Step 3 client receives
SYNACK replies with ACK segment which may contain data
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer 3-81
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Principles of Congestion Control
Congestionbull Informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull Different from flow controlbull Manifestations
ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)
bull A ldquotop-10rdquo problem
Causescosts of Congestion Scenario 1
bull Two senders two receivers
bull One router infinite buffers
bull No retransmission
bull Large delays when congested
bull Maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
Causescosts of Congestion Scenario 2
bull One router finite buffers bull Sender retransmission of lost packet
finite shared output link
buffers
Host A
lin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of congestion Scenario 2
bull Always (goodput)
bull ldquoPerfectrdquo retransmission only when loss
bull Retransmission of delayed (not lost) packet makes
larger (than perfect case) for same
lin
lout
=
lin
lout
gtl
inl
out
ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of
pkt
R2
R2lin
l ou
t
b
R2
R2lin
l ou
t
a
R2
R2lin
l ou
t
c
R4
R3
Causescosts of Congestion Scenario 3
bull Four sendersbull Multihop pathsbull Timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of Congestion Scenario 3
Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Host A
Host B
lo
u
t
Approaches towards congestion control
End-end congestion control
bull No explicit feedback from network
bull Congestion inferred from end-system observed loss delay
bull Approach taken by TCP
Network-assisted congestion control
bull Routers provide feedback to end systemsndash Single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash Explicit rate sender should send at
Broadly
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Congestion Control
bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level
bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)
network not congested so increase sending rate
- lost segment assume loss due to congested network so decrease sending rate
TCP Congestion Control Bandwidth Probing
bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since
available bandwidth is changing depending on other connections in network)
ACKs being received so increase rate
X
X
XX
X loss so decrease rate
send
ing
rate
time
bull Q how fast to increasedecrease- details to follow
TCPrsquosldquosawtoothrdquobehavior
TCP Congestion Control detailsbull sender limits rate by limiting
number of unACKed bytes ldquoin pipelinerdquo
ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)
bull roughly
bull cwnd is dynamic function of perceived network congestion
rate = cwnd
RTT bytessec
LastByteSent-LastByteAcked cwnd
cwndbytes
RTT
ACK(s)
TCP Congestion Control more details
segment loss event reducing cwnd
bull timeout no response from receiverndash cut cwnd to 1
bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less
aggressively than on timeout
ACK received increase cwnd
bull slowstart phase - increase exponentially
fast (despite name) at connection start or following timeout
bull congestion avoidance - increase linearly
TCP Slow Start
bull when connection begins cwnd = 1 MSSndash example MSS = 500
bytes amp RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp
up to respectable ratebull increase rate exponentially
until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transitioning intoout of slowstart
ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2
ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to
congestion avoidance phase
slow start timeout
ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed
new ACKdupACKcount++
duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion
avoidance
TCP Congestion Avoidance
bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1
MSS per RTT ndash approach possible
congestion slower than in slowstart
ndash implementation cwnd = cwnd + MSScwnd for each ACK received
bull ACKs increase cwnd by 1 MSS per RTT additive increase
bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease
AIMD
AIMD Additive IncreaseMultiplicative Decrease
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
Principles of Reliable data transferbull important in app transport link layers
bull top-10 list of important networking topics
bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
Principles of Reliable data transferbull important in app transport link layers
bull top-10 list of important networking topics
bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
Principles of Reliable Data Transferbull important in app transport link layersbull top-10 list of important networking topics
bull Characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
(Zoom
next
slid
e)
Reliable Data Transfer Getting Started
sendside
receiveside
rdt_send() called from above (eg by app) Passed data to deliver to receiver upper layer
udt_send() called by rdtto transfer packet over unreliable channel to receiver
rdt_rcv() called when packet arrives on rcv-side of channel
deliver_data() called by rdt to deliver data to upper
Reliable Data Transfer Getting Started
Wersquollbull Incrementally develop sender receiver
sides of reliable data transfer protocol (rdt)bull Consider only unidirectional data transfer
ndash but control info will flow on both directions
bull Use finite state machines (FSM) to specify sender receiver
state 1 state 2
event causing state transitionactions taken on state transition
state when in this ldquostaterdquo next state uniquely determined by next
event
eventactions
Rdt10 Reliable Transfer over a Reliable Channel
bull Underlying channel perfectly reliablendash no bit errorsndash no loss of packets
bull Separate FSMs for sender receiverndash sender sends data into underlying channelndash receiver read data from underlying channel
Wait for call from above packet = make_pkt(data)
udt_send(packet)
rdt_send(data)
extract (packetdata)deliver_data(data)
Wait for call from
below
rdt_rcv(packet)
sender receiverEasy
What if Taking a Message over Phone
bull Message is clearbull Message is garbled
What if Taking a Message over Phone
bull Message is clearndash Ok
bull Message is garbledndash Ask to repeatndash May not need whole message
bull In networks called Automatic Repeat reQuest (ARQ)ndash Need error detectionndash Receiver feedbackndash Retransmission
Rdt20 Channel with Bit Errors (no Loss)
bull Underlying channel may flip bits in packetndash Checksum to detect bit errors
bull The question how to recover from errorsndash acknowledgements (ACKs) receiver explicitly tells
sender that pkt received OKndash negative acknowledgements (NAKs) receiver
explicitly tells sender that pkt had errorsndash Sender retransmits pkt on receipt of NAK
bull New mechanisms in rdt20 (beyond rdt10)ndash Error detectionndash Receiver feedback control msgs (ACKNAK)
rcvrsender
Rdt20 FSM Specification
Wait for call from above
sndpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from below
sender
receiverrdt_send(data)
L
Rdt20 Operation with No Errors
Wait for call from above
snkpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from
below
rdt_send(data)
L
sender
receiver
Rdt20 Error Scenario
Wait for call from above
snkpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from
below
rdt_send(data)
L
sender
receiver
Sender sends one packet then waits for receiver response
stop and wait
Rdt20 Has a Fatal Flaw
Wait for call from above
sndpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from below
sender
receiverrdt_send(data)
L
Rdt20 Has a Fatal Flaw
What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicatebull How to handle duplicates
Rdt20 Has a Fatal Flaw
What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicate
Handling duplicates bull Sender retransmits
current pkt if ACKNAK garbled
bull Sender adds sequence number to each pktndash Can use 1 bit (for now)
bull receiver discards (doesnrsquot deliver up) duplicate pkt
Rdt21 Sender Handles Garbled ACKNAKs
Wait for call 0 from
above
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)
rdt_send(data)
Wait for ACK or NAK 0
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) ||isNAK(rcvpkt) )
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)
rdt_send(data)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)
Wait for call 1 from
above
Wait for ACK or NAK 1
LL
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)
Rdt21 Receiver Handles Garbled ACKNAKs
Wait for 0 from below
sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq0(rcvpkt)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
ampamp has_seq1(rcvpkt) extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
Wait for 1 from below
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq0(rcvpkt)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq1(rcvpkt)
sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
Rdt21 Discussion
Senderbull seq added to pktbull two seq rsquos (01) will sufficebull must check if received ACKNAK corrupted bull twice as many states
ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq
Receiverbull must check if received packet is duplicate
ndash state indicates whether 0 or 1 is expected pkt seq
bull note receiver can not know if its last ACKNAK received OK at sender
Rdt22 a NAK-free Protocol
bull Reduce type of response ACK onlybull Same functionality as rdt21 using ACKs onlybull Instead of NAK receiver sends ACK for last
pkt received OKndash receiver must explicitly include seq of pkt being
ACKed
bull Duplicate ACK at sender results in same action as NAK retransmit current pkt
rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))
udt_send(sndpkt)
Rdt22 Sender amp Receiver Fragments
Wait for call 0 from
above
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)
rdt_send(data)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
Wait for ACK 0
sender FSMfragment
Wait for 0 from below
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)
receiver FSMfragment
L
Rdt30 Channels with Errors and Loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of
help but not enough
How to determine if a packet is lost
Rdt30 Channels with Errors and Loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of
help but not enough
Approach sender waits ldquoreasonablerdquo amount of time for ACK
bull Retransmits if no ACK received in this time
bull If pkt (or ACK) just delayed (not lost)ndash Retransmission will be
duplicate but use of seq rsquos already handles this
ndash Receiver must specify seq of pkt being ACKed
bull Requires countdown timer
Rdt30 Sender
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer
rdt_send(data)
Wait for
ACK0
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )
Wait for call 1 from
above
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer
rdt_send(data)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)
stop_timerstop_timer
udt_send(sndpkt)start_timer
timeout
udt_send(sndpkt)start_timer
timeout
rdt_rcv(rcvpkt)
Wait for call 0from
above
Wait for
ACK1
Lrdt_rcv(rcvpkt)
LL
L
Rdt30 in Action
Rdt30 in Action
Performance of Rdt30
bull Rdt30 works but performance stinkshellipbull ex 1 Gbps link 15 ms prop delay 8000 bit
packet
U sender utilization ndash fraction of time sender busy sending
U sender
= 008
30008 = 000027
microseconds
L R
RTT + L R =
bull 1KB pkt every 30 msec -gt 33kBsec throughput over 1 Gbps link
bull Network protocol limits use of physical resources
dsmicrosecon8bps10
bits80009
R
Ldtrans
(Picture next slide)
Rdt30 Stop-and-Wait Operation
first packet bit transmitted t = 0
sender
receiver
RTT
last packet bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
U sender
= 008
30008 = 000027
microseconds
L R
RTT + L R =
Pipelined Protocols
Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash Range of sequence numbers must be increasedndash Need buffering at sender andor receiver
Pipelining Increased Utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
U sender
= 024
30008 = 00008
microseconds
3 L R
RTT + L R =
Increase utilizationby a factor of 3
bull Two generic forms of pipelined protocols go-Back-N selective repeat
Pipelining Protocols
Go-back-N overviewbull sender up to N
unACKed pkts in pipeline
bull receiver only sends cumulative ACKsndash doesnrsquot ACK pkt if
therersquos a gap
bull sender has timer for oldest unACKed pktndash if timer expires
retransmit all unACKed packets
Selective Repeat overviewbull sender up to N
unACKed packets in pipeline
bull receiver ACKs individual pkts
bull sender maintains timer for each unACKed pktndash if timer expires
retransmit only unACKed packet
Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unACKed pkts allowed
bull ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)
bull Timer for each in-flight pktbull Timeout(n) retransmit pkt n and all higher seq pkts in
window
GBN Sender Extended FSM
Waitstart_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])
timeout
rdt_send(data)
if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)
base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
base=1nextseqnum=1
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
L
GBN Receiver Extended FSM
ACK-only always send ACK for correctly-received pkt with highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum
bull out-of-order pkt ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK pkt with highest in-order seq
Wait
udt_send(sndpkt)
default
rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++
expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)
L
GBN inaction
GBN Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsgo-back-nindexhtml
Selective Repeatbull Receiver individually acknowledges all
correctly received pktsndash Buffers pkts as needed for eventual in-order
delivery to upper layer
bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt
bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts
Selective Repeat sender receiver windows
Selective Repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next
unACKed seq
senderpkt n in [rcvbase rcvbase+N-
1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also
deliver buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1]
bull ACK(n)
otherwise bull ignore
receiver
Transport Layer 3-52
Selective Repeat in Action
Selective Repeat
DilemmaExample bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
SR Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Overview RFCs 793 1122 1323 2018 2581
bull full duplex datandash bi-directional data flow
in same connectionndash MSS maximum
segment size
bull connection-oriented ndash handshaking
(exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not
overwhelm receiver
bull point-to-pointndash one sender one
receiver
bull reliable in-order byte steamndash no ldquomessage
boundariesrdquo
bull pipelinedndash TCP congestion and
flow control set window size
bull send amp receive buffers
socketdoor
T C Psend buffer
T C Preceive buffer
socketdoor
segm ent
applicationwrites data
applicationreads data
TCP Segment Structure
source port dest port
32 bits
applicationdata (variable length)
sequence number
acknowledgement number
Receive window
Urg data pointerchecksumFSRPAU
headlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
TCP Seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next
byte expected from other side
ndash cumulative ACKQ how receiver
handles out-of-order segmentsndash A TCP spec
doesnrsquot say up to implementer
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
UsertypeslsquoCrsquo
host ACKsreceipt of echoedlsquoCrsquo
timesimple telnet scenario
host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTT
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTTbull SampleRTT measured time
from segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
bull Exponential weighted moving averagebull influence of past sample decreases exponentially
fastbull typical value = 18th (or 0125)
Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety margin
bull First estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP reliable data transfer
bull TCP creates rdt service on top of IPrsquos unreliable service
bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs
bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control
TCP Sender EventsData rcvd from appbull Create segment with
seq bull seq is byte-stream
number of first data byte in segment
bull Start timer if not already running (think of timer as for oldest unACKed segment)
bull Expiration interval TimeOutInterval
Timeoutbull retransmit segment
that caused timeoutbull restart timer ACK rcvdbull If acknowledges
previously unACKed segmentsndash update what is known
to be ACKedndash start timer if there are
outstanding segments
TCP Sender
(simplified)
NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71
y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed
TCP Retransmission Scenarios
Host A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq
=92
tim
eout
ACK=120
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq
=92
tim
eout
SendBase= 100
SendBase= 120
SendBase= 120
Sendbase= 100
TCP Retransmission Scenarios (more)
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
Cumulative ACK scenario
Host B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
Fast Retransmit
bull Time-out period often relatively longndash Long delay before
resending lost packet
bull Detect lost segments via duplicate ACKsndash Sender often sends
many segments back-to-back
ndash If segment lost there will likely be many duplicate ACKs for that segment
bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
Host A
timeo
ut
Host B
time
X
resend seq X2
seq x1seq x2seq x3seq x4seq x5
ACK x1
ACK x1ACK x1ACK x1
tripleduplicate
ACKs
Fast
Retra
nsm
it
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Flow Control
bull Receive side of TCP connection has a receive buffer
bull speed-matching service matching send rate to receiving applicationrsquos drain rate
bull App process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast
flow control
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
TCP Flow Control How it Works
(suppose TCP receiver discards out-of-order segments)
bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Receiver advertises unused buffer space by including rwnd value in segment header
bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos
buffer doesnrsquot overflow
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
rwndRcvBuffer
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control
info (eg RcvWindow)bull client connection
initiator Socket clientSocket = new
Socket(ldquohostnamerdquo port) bull server contacted by
client Socket connectionSocket =
welcomeSocketaccept()
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial
seq Step 3 client receives
SYNACK replies with ACK segment which may contain data
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer 3-81
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Principles of Congestion Control
Congestionbull Informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull Different from flow controlbull Manifestations
ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)
bull A ldquotop-10rdquo problem
Causescosts of Congestion Scenario 1
bull Two senders two receivers
bull One router infinite buffers
bull No retransmission
bull Large delays when congested
bull Maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
Causescosts of Congestion Scenario 2
bull One router finite buffers bull Sender retransmission of lost packet
finite shared output link
buffers
Host A
lin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of congestion Scenario 2
bull Always (goodput)
bull ldquoPerfectrdquo retransmission only when loss
bull Retransmission of delayed (not lost) packet makes
larger (than perfect case) for same
lin
lout
=
lin
lout
gtl
inl
out
ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of
pkt
R2
R2lin
l ou
t
b
R2
R2lin
l ou
t
a
R2
R2lin
l ou
t
c
R4
R3
Causescosts of Congestion Scenario 3
bull Four sendersbull Multihop pathsbull Timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of Congestion Scenario 3
Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Host A
Host B
lo
u
t
Approaches towards congestion control
End-end congestion control
bull No explicit feedback from network
bull Congestion inferred from end-system observed loss delay
bull Approach taken by TCP
Network-assisted congestion control
bull Routers provide feedback to end systemsndash Single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash Explicit rate sender should send at
Broadly
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Congestion Control
bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level
bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)
network not congested so increase sending rate
- lost segment assume loss due to congested network so decrease sending rate
TCP Congestion Control Bandwidth Probing
bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since
available bandwidth is changing depending on other connections in network)
ACKs being received so increase rate
X
X
XX
X loss so decrease rate
send
ing
rate
time
bull Q how fast to increasedecrease- details to follow
TCPrsquosldquosawtoothrdquobehavior
TCP Congestion Control detailsbull sender limits rate by limiting
number of unACKed bytes ldquoin pipelinerdquo
ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)
bull roughly
bull cwnd is dynamic function of perceived network congestion
rate = cwnd
RTT bytessec
LastByteSent-LastByteAcked cwnd
cwndbytes
RTT
ACK(s)
TCP Congestion Control more details
segment loss event reducing cwnd
bull timeout no response from receiverndash cut cwnd to 1
bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less
aggressively than on timeout
ACK received increase cwnd
bull slowstart phase - increase exponentially
fast (despite name) at connection start or following timeout
bull congestion avoidance - increase linearly
TCP Slow Start
bull when connection begins cwnd = 1 MSSndash example MSS = 500
bytes amp RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp
up to respectable ratebull increase rate exponentially
until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transitioning intoout of slowstart
ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2
ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to
congestion avoidance phase
slow start timeout
ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed
new ACKdupACKcount++
duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion
avoidance
TCP Congestion Avoidance
bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1
MSS per RTT ndash approach possible
congestion slower than in slowstart
ndash implementation cwnd = cwnd + MSScwnd for each ACK received
bull ACKs increase cwnd by 1 MSS per RTT additive increase
bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease
AIMD
AIMD Additive IncreaseMultiplicative Decrease
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
Principles of Reliable data transferbull important in app transport link layers
bull top-10 list of important networking topics
bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
Principles of Reliable Data Transferbull important in app transport link layersbull top-10 list of important networking topics
bull Characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
(Zoom
next
slid
e)
Reliable Data Transfer Getting Started
sendside
receiveside
rdt_send() called from above (eg by app) Passed data to deliver to receiver upper layer
udt_send() called by rdtto transfer packet over unreliable channel to receiver
rdt_rcv() called when packet arrives on rcv-side of channel
deliver_data() called by rdt to deliver data to upper
Reliable Data Transfer Getting Started
Wersquollbull Incrementally develop sender receiver
sides of reliable data transfer protocol (rdt)bull Consider only unidirectional data transfer
ndash but control info will flow on both directions
bull Use finite state machines (FSM) to specify sender receiver
state 1 state 2
event causing state transitionactions taken on state transition
state when in this ldquostaterdquo next state uniquely determined by next
event
eventactions
Rdt10 Reliable Transfer over a Reliable Channel
bull Underlying channel perfectly reliablendash no bit errorsndash no loss of packets
bull Separate FSMs for sender receiverndash sender sends data into underlying channelndash receiver read data from underlying channel
Wait for call from above packet = make_pkt(data)
udt_send(packet)
rdt_send(data)
extract (packetdata)deliver_data(data)
Wait for call from
below
rdt_rcv(packet)
sender receiverEasy
What if Taking a Message over Phone
bull Message is clearbull Message is garbled
What if Taking a Message over Phone
bull Message is clearndash Ok
bull Message is garbledndash Ask to repeatndash May not need whole message
bull In networks called Automatic Repeat reQuest (ARQ)ndash Need error detectionndash Receiver feedbackndash Retransmission
Rdt20 Channel with Bit Errors (no Loss)
bull Underlying channel may flip bits in packetndash Checksum to detect bit errors
bull The question how to recover from errorsndash acknowledgements (ACKs) receiver explicitly tells
sender that pkt received OKndash negative acknowledgements (NAKs) receiver
explicitly tells sender that pkt had errorsndash Sender retransmits pkt on receipt of NAK
bull New mechanisms in rdt20 (beyond rdt10)ndash Error detectionndash Receiver feedback control msgs (ACKNAK)
rcvrsender
Rdt20 FSM Specification
Wait for call from above
sndpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from below
sender
receiverrdt_send(data)
L
Rdt20 Operation with No Errors
Wait for call from above
snkpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from
below
rdt_send(data)
L
sender
receiver
Rdt20 Error Scenario
Wait for call from above
snkpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from
below
rdt_send(data)
L
sender
receiver
Sender sends one packet then waits for receiver response
stop and wait
Rdt20 Has a Fatal Flaw
Wait for call from above
sndpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from below
sender
receiverrdt_send(data)
L
Rdt20 Has a Fatal Flaw
What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicatebull How to handle duplicates
Rdt20 Has a Fatal Flaw
What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicate
Handling duplicates bull Sender retransmits
current pkt if ACKNAK garbled
bull Sender adds sequence number to each pktndash Can use 1 bit (for now)
bull receiver discards (doesnrsquot deliver up) duplicate pkt
Rdt21 Sender Handles Garbled ACKNAKs
Wait for call 0 from
above
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)
rdt_send(data)
Wait for ACK or NAK 0
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) ||isNAK(rcvpkt) )
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)
rdt_send(data)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)
Wait for call 1 from
above
Wait for ACK or NAK 1
LL
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)
Rdt21 Receiver Handles Garbled ACKNAKs
Wait for 0 from below
sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq0(rcvpkt)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
ampamp has_seq1(rcvpkt) extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
Wait for 1 from below
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq0(rcvpkt)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq1(rcvpkt)
sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
Rdt21 Discussion
Senderbull seq added to pktbull two seq rsquos (01) will sufficebull must check if received ACKNAK corrupted bull twice as many states
ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq
Receiverbull must check if received packet is duplicate
ndash state indicates whether 0 or 1 is expected pkt seq
bull note receiver can not know if its last ACKNAK received OK at sender
Rdt22 a NAK-free Protocol
bull Reduce type of response ACK onlybull Same functionality as rdt21 using ACKs onlybull Instead of NAK receiver sends ACK for last
pkt received OKndash receiver must explicitly include seq of pkt being
ACKed
bull Duplicate ACK at sender results in same action as NAK retransmit current pkt
rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))
udt_send(sndpkt)
Rdt22 Sender amp Receiver Fragments
Wait for call 0 from
above
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)
rdt_send(data)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
Wait for ACK 0
sender FSMfragment
Wait for 0 from below
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)
receiver FSMfragment
L
Rdt30 Channels with Errors and Loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of
help but not enough
How to determine if a packet is lost
Rdt30 Channels with Errors and Loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of
help but not enough
Approach sender waits ldquoreasonablerdquo amount of time for ACK
bull Retransmits if no ACK received in this time
bull If pkt (or ACK) just delayed (not lost)ndash Retransmission will be
duplicate but use of seq rsquos already handles this
ndash Receiver must specify seq of pkt being ACKed
bull Requires countdown timer
Rdt30 Sender
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer
rdt_send(data)
Wait for
ACK0
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )
Wait for call 1 from
above
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer
rdt_send(data)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)
stop_timerstop_timer
udt_send(sndpkt)start_timer
timeout
udt_send(sndpkt)start_timer
timeout
rdt_rcv(rcvpkt)
Wait for call 0from
above
Wait for
ACK1
Lrdt_rcv(rcvpkt)
LL
L
Rdt30 in Action
Rdt30 in Action
Performance of Rdt30
bull Rdt30 works but performance stinkshellipbull ex 1 Gbps link 15 ms prop delay 8000 bit
packet
U sender utilization ndash fraction of time sender busy sending
U sender
= 008
30008 = 000027
microseconds
L R
RTT + L R =
bull 1KB pkt every 30 msec -gt 33kBsec throughput over 1 Gbps link
bull Network protocol limits use of physical resources
dsmicrosecon8bps10
bits80009
R
Ldtrans
(Picture next slide)
Rdt30 Stop-and-Wait Operation
first packet bit transmitted t = 0
sender
receiver
RTT
last packet bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
U sender
= 008
30008 = 000027
microseconds
L R
RTT + L R =
Pipelined Protocols
Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash Range of sequence numbers must be increasedndash Need buffering at sender andor receiver
Pipelining Increased Utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
U sender
= 024
30008 = 00008
microseconds
3 L R
RTT + L R =
Increase utilizationby a factor of 3
bull Two generic forms of pipelined protocols go-Back-N selective repeat
Pipelining Protocols
Go-back-N overviewbull sender up to N
unACKed pkts in pipeline
bull receiver only sends cumulative ACKsndash doesnrsquot ACK pkt if
therersquos a gap
bull sender has timer for oldest unACKed pktndash if timer expires
retransmit all unACKed packets
Selective Repeat overviewbull sender up to N
unACKed packets in pipeline
bull receiver ACKs individual pkts
bull sender maintains timer for each unACKed pktndash if timer expires
retransmit only unACKed packet
Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unACKed pkts allowed
bull ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)
bull Timer for each in-flight pktbull Timeout(n) retransmit pkt n and all higher seq pkts in
window
GBN Sender Extended FSM
Waitstart_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])
timeout
rdt_send(data)
if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)
base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
base=1nextseqnum=1
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
L
GBN Receiver Extended FSM
ACK-only always send ACK for correctly-received pkt with highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum
bull out-of-order pkt ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK pkt with highest in-order seq
Wait
udt_send(sndpkt)
default
rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++
expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)
L
GBN inaction
GBN Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsgo-back-nindexhtml
Selective Repeatbull Receiver individually acknowledges all
correctly received pktsndash Buffers pkts as needed for eventual in-order
delivery to upper layer
bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt
bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts
Selective Repeat sender receiver windows
Selective Repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next
unACKed seq
senderpkt n in [rcvbase rcvbase+N-
1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also
deliver buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1]
bull ACK(n)
otherwise bull ignore
receiver
Transport Layer 3-52
Selective Repeat in Action
Selective Repeat
DilemmaExample bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
SR Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Overview RFCs 793 1122 1323 2018 2581
bull full duplex datandash bi-directional data flow
in same connectionndash MSS maximum
segment size
bull connection-oriented ndash handshaking
(exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not
overwhelm receiver
bull point-to-pointndash one sender one
receiver
bull reliable in-order byte steamndash no ldquomessage
boundariesrdquo
bull pipelinedndash TCP congestion and
flow control set window size
bull send amp receive buffers
socketdoor
T C Psend buffer
T C Preceive buffer
socketdoor
segm ent
applicationwrites data
applicationreads data
TCP Segment Structure
source port dest port
32 bits
applicationdata (variable length)
sequence number
acknowledgement number
Receive window
Urg data pointerchecksumFSRPAU
headlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
TCP Seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next
byte expected from other side
ndash cumulative ACKQ how receiver
handles out-of-order segmentsndash A TCP spec
doesnrsquot say up to implementer
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
UsertypeslsquoCrsquo
host ACKsreceipt of echoedlsquoCrsquo
timesimple telnet scenario
host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTT
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTTbull SampleRTT measured time
from segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
bull Exponential weighted moving averagebull influence of past sample decreases exponentially
fastbull typical value = 18th (or 0125)
Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety margin
bull First estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP reliable data transfer
bull TCP creates rdt service on top of IPrsquos unreliable service
bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs
bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control
TCP Sender EventsData rcvd from appbull Create segment with
seq bull seq is byte-stream
number of first data byte in segment
bull Start timer if not already running (think of timer as for oldest unACKed segment)
bull Expiration interval TimeOutInterval
Timeoutbull retransmit segment
that caused timeoutbull restart timer ACK rcvdbull If acknowledges
previously unACKed segmentsndash update what is known
to be ACKedndash start timer if there are
outstanding segments
TCP Sender
(simplified)
NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71
y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed
TCP Retransmission Scenarios
Host A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq
=92
tim
eout
ACK=120
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq
=92
tim
eout
SendBase= 100
SendBase= 120
SendBase= 120
Sendbase= 100
TCP Retransmission Scenarios (more)
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
Cumulative ACK scenario
Host B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
Fast Retransmit
bull Time-out period often relatively longndash Long delay before
resending lost packet
bull Detect lost segments via duplicate ACKsndash Sender often sends
many segments back-to-back
ndash If segment lost there will likely be many duplicate ACKs for that segment
bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
Host A
timeo
ut
Host B
time
X
resend seq X2
seq x1seq x2seq x3seq x4seq x5
ACK x1
ACK x1ACK x1ACK x1
tripleduplicate
ACKs
Fast
Retra
nsm
it
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Flow Control
bull Receive side of TCP connection has a receive buffer
bull speed-matching service matching send rate to receiving applicationrsquos drain rate
bull App process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast
flow control
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
TCP Flow Control How it Works
(suppose TCP receiver discards out-of-order segments)
bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Receiver advertises unused buffer space by including rwnd value in segment header
bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos
buffer doesnrsquot overflow
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
rwndRcvBuffer
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control
info (eg RcvWindow)bull client connection
initiator Socket clientSocket = new
Socket(ldquohostnamerdquo port) bull server contacted by
client Socket connectionSocket =
welcomeSocketaccept()
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial
seq Step 3 client receives
SYNACK replies with ACK segment which may contain data
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer 3-81
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Principles of Congestion Control
Congestionbull Informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull Different from flow controlbull Manifestations
ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)
bull A ldquotop-10rdquo problem
Causescosts of Congestion Scenario 1
bull Two senders two receivers
bull One router infinite buffers
bull No retransmission
bull Large delays when congested
bull Maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
Causescosts of Congestion Scenario 2
bull One router finite buffers bull Sender retransmission of lost packet
finite shared output link
buffers
Host A
lin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of congestion Scenario 2
bull Always (goodput)
bull ldquoPerfectrdquo retransmission only when loss
bull Retransmission of delayed (not lost) packet makes
larger (than perfect case) for same
lin
lout
=
lin
lout
gtl
inl
out
ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of
pkt
R2
R2lin
l ou
t
b
R2
R2lin
l ou
t
a
R2
R2lin
l ou
t
c
R4
R3
Causescosts of Congestion Scenario 3
bull Four sendersbull Multihop pathsbull Timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of Congestion Scenario 3
Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Host A
Host B
lo
u
t
Approaches towards congestion control
End-end congestion control
bull No explicit feedback from network
bull Congestion inferred from end-system observed loss delay
bull Approach taken by TCP
Network-assisted congestion control
bull Routers provide feedback to end systemsndash Single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash Explicit rate sender should send at
Broadly
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Congestion Control
bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level
bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)
network not congested so increase sending rate
- lost segment assume loss due to congested network so decrease sending rate
TCP Congestion Control Bandwidth Probing
bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since
available bandwidth is changing depending on other connections in network)
ACKs being received so increase rate
X
X
XX
X loss so decrease rate
send
ing
rate
time
bull Q how fast to increasedecrease- details to follow
TCPrsquosldquosawtoothrdquobehavior
TCP Congestion Control detailsbull sender limits rate by limiting
number of unACKed bytes ldquoin pipelinerdquo
ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)
bull roughly
bull cwnd is dynamic function of perceived network congestion
rate = cwnd
RTT bytessec
LastByteSent-LastByteAcked cwnd
cwndbytes
RTT
ACK(s)
TCP Congestion Control more details
segment loss event reducing cwnd
bull timeout no response from receiverndash cut cwnd to 1
bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less
aggressively than on timeout
ACK received increase cwnd
bull slowstart phase - increase exponentially
fast (despite name) at connection start or following timeout
bull congestion avoidance - increase linearly
TCP Slow Start
bull when connection begins cwnd = 1 MSSndash example MSS = 500
bytes amp RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp
up to respectable ratebull increase rate exponentially
until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transitioning intoout of slowstart
ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2
ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to
congestion avoidance phase
slow start timeout
ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed
new ACKdupACKcount++
duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion
avoidance
TCP Congestion Avoidance
bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1
MSS per RTT ndash approach possible
congestion slower than in slowstart
ndash implementation cwnd = cwnd + MSScwnd for each ACK received
bull ACKs increase cwnd by 1 MSS per RTT additive increase
bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease
AIMD
AIMD Additive IncreaseMultiplicative Decrease
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
Principles of Reliable Data Transferbull important in app transport link layersbull top-10 list of important networking topics
bull Characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
(Zoom
next
slid
e)
Reliable Data Transfer Getting Started
sendside
receiveside
rdt_send() called from above (eg by app) Passed data to deliver to receiver upper layer
udt_send() called by rdtto transfer packet over unreliable channel to receiver
rdt_rcv() called when packet arrives on rcv-side of channel
deliver_data() called by rdt to deliver data to upper
Reliable Data Transfer Getting Started
Wersquollbull Incrementally develop sender receiver
sides of reliable data transfer protocol (rdt)bull Consider only unidirectional data transfer
ndash but control info will flow on both directions
bull Use finite state machines (FSM) to specify sender receiver
state 1 state 2
event causing state transitionactions taken on state transition
state when in this ldquostaterdquo next state uniquely determined by next
event
eventactions
Rdt10 Reliable Transfer over a Reliable Channel
bull Underlying channel perfectly reliablendash no bit errorsndash no loss of packets
bull Separate FSMs for sender receiverndash sender sends data into underlying channelndash receiver read data from underlying channel
Wait for call from above packet = make_pkt(data)
udt_send(packet)
rdt_send(data)
extract (packetdata)deliver_data(data)
Wait for call from
below
rdt_rcv(packet)
sender receiverEasy
What if Taking a Message over Phone
bull Message is clearbull Message is garbled
What if Taking a Message over Phone
bull Message is clearndash Ok
bull Message is garbledndash Ask to repeatndash May not need whole message
bull In networks called Automatic Repeat reQuest (ARQ)ndash Need error detectionndash Receiver feedbackndash Retransmission
Rdt20 Channel with Bit Errors (no Loss)
bull Underlying channel may flip bits in packetndash Checksum to detect bit errors
bull The question how to recover from errorsndash acknowledgements (ACKs) receiver explicitly tells
sender that pkt received OKndash negative acknowledgements (NAKs) receiver
explicitly tells sender that pkt had errorsndash Sender retransmits pkt on receipt of NAK
bull New mechanisms in rdt20 (beyond rdt10)ndash Error detectionndash Receiver feedback control msgs (ACKNAK)
rcvrsender
Rdt20 FSM Specification
Wait for call from above
sndpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from below
sender
receiverrdt_send(data)
L
Rdt20 Operation with No Errors
Wait for call from above
snkpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from
below
rdt_send(data)
L
sender
receiver
Rdt20 Error Scenario
Wait for call from above
snkpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from
below
rdt_send(data)
L
sender
receiver
Sender sends one packet then waits for receiver response
stop and wait
Rdt20 Has a Fatal Flaw
Wait for call from above
sndpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from below
sender
receiverrdt_send(data)
L
Rdt20 Has a Fatal Flaw
What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicatebull How to handle duplicates
Rdt20 Has a Fatal Flaw
What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicate
Handling duplicates bull Sender retransmits
current pkt if ACKNAK garbled
bull Sender adds sequence number to each pktndash Can use 1 bit (for now)
bull receiver discards (doesnrsquot deliver up) duplicate pkt
Rdt21 Sender Handles Garbled ACKNAKs
Wait for call 0 from
above
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)
rdt_send(data)
Wait for ACK or NAK 0
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) ||isNAK(rcvpkt) )
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)
rdt_send(data)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)
Wait for call 1 from
above
Wait for ACK or NAK 1
LL
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)
Rdt21 Receiver Handles Garbled ACKNAKs
Wait for 0 from below
sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq0(rcvpkt)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
ampamp has_seq1(rcvpkt) extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
Wait for 1 from below
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq0(rcvpkt)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq1(rcvpkt)
sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
Rdt21 Discussion
Senderbull seq added to pktbull two seq rsquos (01) will sufficebull must check if received ACKNAK corrupted bull twice as many states
ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq
Receiverbull must check if received packet is duplicate
ndash state indicates whether 0 or 1 is expected pkt seq
bull note receiver can not know if its last ACKNAK received OK at sender
Rdt22 a NAK-free Protocol
bull Reduce type of response ACK onlybull Same functionality as rdt21 using ACKs onlybull Instead of NAK receiver sends ACK for last
pkt received OKndash receiver must explicitly include seq of pkt being
ACKed
bull Duplicate ACK at sender results in same action as NAK retransmit current pkt
rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))
udt_send(sndpkt)
Rdt22 Sender amp Receiver Fragments
Wait for call 0 from
above
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)
rdt_send(data)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
Wait for ACK 0
sender FSMfragment
Wait for 0 from below
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)
receiver FSMfragment
L
Rdt30 Channels with Errors and Loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of
help but not enough
How to determine if a packet is lost
Rdt30 Channels with Errors and Loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of
help but not enough
Approach sender waits ldquoreasonablerdquo amount of time for ACK
bull Retransmits if no ACK received in this time
bull If pkt (or ACK) just delayed (not lost)ndash Retransmission will be
duplicate but use of seq rsquos already handles this
ndash Receiver must specify seq of pkt being ACKed
bull Requires countdown timer
Rdt30 Sender
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer
rdt_send(data)
Wait for
ACK0
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )
Wait for call 1 from
above
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer
rdt_send(data)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)
stop_timerstop_timer
udt_send(sndpkt)start_timer
timeout
udt_send(sndpkt)start_timer
timeout
rdt_rcv(rcvpkt)
Wait for call 0from
above
Wait for
ACK1
Lrdt_rcv(rcvpkt)
LL
L
Rdt30 in Action
Rdt30 in Action
Performance of Rdt30
bull Rdt30 works but performance stinkshellipbull ex 1 Gbps link 15 ms prop delay 8000 bit
packet
U sender utilization ndash fraction of time sender busy sending
U sender
= 008
30008 = 000027
microseconds
L R
RTT + L R =
bull 1KB pkt every 30 msec -gt 33kBsec throughput over 1 Gbps link
bull Network protocol limits use of physical resources
dsmicrosecon8bps10
bits80009
R
Ldtrans
(Picture next slide)
Rdt30 Stop-and-Wait Operation
first packet bit transmitted t = 0
sender
receiver
RTT
last packet bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
U sender
= 008
30008 = 000027
microseconds
L R
RTT + L R =
Pipelined Protocols
Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash Range of sequence numbers must be increasedndash Need buffering at sender andor receiver
Pipelining Increased Utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
U sender
= 024
30008 = 00008
microseconds
3 L R
RTT + L R =
Increase utilizationby a factor of 3
bull Two generic forms of pipelined protocols go-Back-N selective repeat
Pipelining Protocols
Go-back-N overviewbull sender up to N
unACKed pkts in pipeline
bull receiver only sends cumulative ACKsndash doesnrsquot ACK pkt if
therersquos a gap
bull sender has timer for oldest unACKed pktndash if timer expires
retransmit all unACKed packets
Selective Repeat overviewbull sender up to N
unACKed packets in pipeline
bull receiver ACKs individual pkts
bull sender maintains timer for each unACKed pktndash if timer expires
retransmit only unACKed packet
Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unACKed pkts allowed
bull ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)
bull Timer for each in-flight pktbull Timeout(n) retransmit pkt n and all higher seq pkts in
window
GBN Sender Extended FSM
Waitstart_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])
timeout
rdt_send(data)
if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)
base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
base=1nextseqnum=1
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
L
GBN Receiver Extended FSM
ACK-only always send ACK for correctly-received pkt with highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum
bull out-of-order pkt ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK pkt with highest in-order seq
Wait
udt_send(sndpkt)
default
rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++
expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)
L
GBN inaction
GBN Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsgo-back-nindexhtml
Selective Repeatbull Receiver individually acknowledges all
correctly received pktsndash Buffers pkts as needed for eventual in-order
delivery to upper layer
bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt
bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts
Selective Repeat sender receiver windows
Selective Repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next
unACKed seq
senderpkt n in [rcvbase rcvbase+N-
1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also
deliver buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1]
bull ACK(n)
otherwise bull ignore
receiver
Transport Layer 3-52
Selective Repeat in Action
Selective Repeat
DilemmaExample bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
SR Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Overview RFCs 793 1122 1323 2018 2581
bull full duplex datandash bi-directional data flow
in same connectionndash MSS maximum
segment size
bull connection-oriented ndash handshaking
(exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not
overwhelm receiver
bull point-to-pointndash one sender one
receiver
bull reliable in-order byte steamndash no ldquomessage
boundariesrdquo
bull pipelinedndash TCP congestion and
flow control set window size
bull send amp receive buffers
socketdoor
T C Psend buffer
T C Preceive buffer
socketdoor
segm ent
applicationwrites data
applicationreads data
TCP Segment Structure
source port dest port
32 bits
applicationdata (variable length)
sequence number
acknowledgement number
Receive window
Urg data pointerchecksumFSRPAU
headlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
TCP Seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next
byte expected from other side
ndash cumulative ACKQ how receiver
handles out-of-order segmentsndash A TCP spec
doesnrsquot say up to implementer
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
UsertypeslsquoCrsquo
host ACKsreceipt of echoedlsquoCrsquo
timesimple telnet scenario
host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTT
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTTbull SampleRTT measured time
from segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
bull Exponential weighted moving averagebull influence of past sample decreases exponentially
fastbull typical value = 18th (or 0125)
Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety margin
bull First estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP reliable data transfer
bull TCP creates rdt service on top of IPrsquos unreliable service
bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs
bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control
TCP Sender EventsData rcvd from appbull Create segment with
seq bull seq is byte-stream
number of first data byte in segment
bull Start timer if not already running (think of timer as for oldest unACKed segment)
bull Expiration interval TimeOutInterval
Timeoutbull retransmit segment
that caused timeoutbull restart timer ACK rcvdbull If acknowledges
previously unACKed segmentsndash update what is known
to be ACKedndash start timer if there are
outstanding segments
TCP Sender
(simplified)
NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71
y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed
TCP Retransmission Scenarios
Host A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq
=92
tim
eout
ACK=120
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq
=92
tim
eout
SendBase= 100
SendBase= 120
SendBase= 120
Sendbase= 100
TCP Retransmission Scenarios (more)
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
Cumulative ACK scenario
Host B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
Fast Retransmit
bull Time-out period often relatively longndash Long delay before
resending lost packet
bull Detect lost segments via duplicate ACKsndash Sender often sends
many segments back-to-back
ndash If segment lost there will likely be many duplicate ACKs for that segment
bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
Host A
timeo
ut
Host B
time
X
resend seq X2
seq x1seq x2seq x3seq x4seq x5
ACK x1
ACK x1ACK x1ACK x1
tripleduplicate
ACKs
Fast
Retra
nsm
it
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Flow Control
bull Receive side of TCP connection has a receive buffer
bull speed-matching service matching send rate to receiving applicationrsquos drain rate
bull App process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast
flow control
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
TCP Flow Control How it Works
(suppose TCP receiver discards out-of-order segments)
bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Receiver advertises unused buffer space by including rwnd value in segment header
bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos
buffer doesnrsquot overflow
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
rwndRcvBuffer
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control
info (eg RcvWindow)bull client connection
initiator Socket clientSocket = new
Socket(ldquohostnamerdquo port) bull server contacted by
client Socket connectionSocket =
welcomeSocketaccept()
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial
seq Step 3 client receives
SYNACK replies with ACK segment which may contain data
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer 3-81
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Principles of Congestion Control
Congestionbull Informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull Different from flow controlbull Manifestations
ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)
bull A ldquotop-10rdquo problem
Causescosts of Congestion Scenario 1
bull Two senders two receivers
bull One router infinite buffers
bull No retransmission
bull Large delays when congested
bull Maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
Causescosts of Congestion Scenario 2
bull One router finite buffers bull Sender retransmission of lost packet
finite shared output link
buffers
Host A
lin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of congestion Scenario 2
bull Always (goodput)
bull ldquoPerfectrdquo retransmission only when loss
bull Retransmission of delayed (not lost) packet makes
larger (than perfect case) for same
lin
lout
=
lin
lout
gtl
inl
out
ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of
pkt
R2
R2lin
l ou
t
b
R2
R2lin
l ou
t
a
R2
R2lin
l ou
t
c
R4
R3
Causescosts of Congestion Scenario 3
bull Four sendersbull Multihop pathsbull Timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of Congestion Scenario 3
Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Host A
Host B
lo
u
t
Approaches towards congestion control
End-end congestion control
bull No explicit feedback from network
bull Congestion inferred from end-system observed loss delay
bull Approach taken by TCP
Network-assisted congestion control
bull Routers provide feedback to end systemsndash Single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash Explicit rate sender should send at
Broadly
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Congestion Control
bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level
bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)
network not congested so increase sending rate
- lost segment assume loss due to congested network so decrease sending rate
TCP Congestion Control Bandwidth Probing
bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since
available bandwidth is changing depending on other connections in network)
ACKs being received so increase rate
X
X
XX
X loss so decrease rate
send
ing
rate
time
bull Q how fast to increasedecrease- details to follow
TCPrsquosldquosawtoothrdquobehavior
TCP Congestion Control detailsbull sender limits rate by limiting
number of unACKed bytes ldquoin pipelinerdquo
ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)
bull roughly
bull cwnd is dynamic function of perceived network congestion
rate = cwnd
RTT bytessec
LastByteSent-LastByteAcked cwnd
cwndbytes
RTT
ACK(s)
TCP Congestion Control more details
segment loss event reducing cwnd
bull timeout no response from receiverndash cut cwnd to 1
bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less
aggressively than on timeout
ACK received increase cwnd
bull slowstart phase - increase exponentially
fast (despite name) at connection start or following timeout
bull congestion avoidance - increase linearly
TCP Slow Start
bull when connection begins cwnd = 1 MSSndash example MSS = 500
bytes amp RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp
up to respectable ratebull increase rate exponentially
until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transitioning intoout of slowstart
ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2
ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to
congestion avoidance phase
slow start timeout
ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed
new ACKdupACKcount++
duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion
avoidance
TCP Congestion Avoidance
bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1
MSS per RTT ndash approach possible
congestion slower than in slowstart
ndash implementation cwnd = cwnd + MSScwnd for each ACK received
bull ACKs increase cwnd by 1 MSS per RTT additive increase
bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease
AIMD
AIMD Additive IncreaseMultiplicative Decrease
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
Reliable Data Transfer Getting Started
sendside
receiveside
rdt_send() called from above (eg by app) Passed data to deliver to receiver upper layer
udt_send() called by rdtto transfer packet over unreliable channel to receiver
rdt_rcv() called when packet arrives on rcv-side of channel
deliver_data() called by rdt to deliver data to upper
Reliable Data Transfer Getting Started
Wersquollbull Incrementally develop sender receiver
sides of reliable data transfer protocol (rdt)bull Consider only unidirectional data transfer
ndash but control info will flow on both directions
bull Use finite state machines (FSM) to specify sender receiver
state 1 state 2
event causing state transitionactions taken on state transition
state when in this ldquostaterdquo next state uniquely determined by next
event
eventactions
Rdt10 Reliable Transfer over a Reliable Channel
bull Underlying channel perfectly reliablendash no bit errorsndash no loss of packets
bull Separate FSMs for sender receiverndash sender sends data into underlying channelndash receiver read data from underlying channel
Wait for call from above packet = make_pkt(data)
udt_send(packet)
rdt_send(data)
extract (packetdata)deliver_data(data)
Wait for call from
below
rdt_rcv(packet)
sender receiverEasy
What if Taking a Message over Phone
bull Message is clearbull Message is garbled
What if Taking a Message over Phone
bull Message is clearndash Ok
bull Message is garbledndash Ask to repeatndash May not need whole message
bull In networks called Automatic Repeat reQuest (ARQ)ndash Need error detectionndash Receiver feedbackndash Retransmission
Rdt20 Channel with Bit Errors (no Loss)
bull Underlying channel may flip bits in packetndash Checksum to detect bit errors
bull The question how to recover from errorsndash acknowledgements (ACKs) receiver explicitly tells
sender that pkt received OKndash negative acknowledgements (NAKs) receiver
explicitly tells sender that pkt had errorsndash Sender retransmits pkt on receipt of NAK
bull New mechanisms in rdt20 (beyond rdt10)ndash Error detectionndash Receiver feedback control msgs (ACKNAK)
rcvrsender
Rdt20 FSM Specification
Wait for call from above
sndpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from below
sender
receiverrdt_send(data)
L
Rdt20 Operation with No Errors
Wait for call from above
snkpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from
below
rdt_send(data)
L
sender
receiver
Rdt20 Error Scenario
Wait for call from above
snkpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from
below
rdt_send(data)
L
sender
receiver
Sender sends one packet then waits for receiver response
stop and wait
Rdt20 Has a Fatal Flaw
Wait for call from above
sndpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from below
sender
receiverrdt_send(data)
L
Rdt20 Has a Fatal Flaw
What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicatebull How to handle duplicates
Rdt20 Has a Fatal Flaw
What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicate
Handling duplicates bull Sender retransmits
current pkt if ACKNAK garbled
bull Sender adds sequence number to each pktndash Can use 1 bit (for now)
bull receiver discards (doesnrsquot deliver up) duplicate pkt
Rdt21 Sender Handles Garbled ACKNAKs
Wait for call 0 from
above
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)
rdt_send(data)
Wait for ACK or NAK 0
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) ||isNAK(rcvpkt) )
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)
rdt_send(data)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)
Wait for call 1 from
above
Wait for ACK or NAK 1
LL
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)
Rdt21 Receiver Handles Garbled ACKNAKs
Wait for 0 from below
sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq0(rcvpkt)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
ampamp has_seq1(rcvpkt) extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
Wait for 1 from below
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq0(rcvpkt)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq1(rcvpkt)
sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
Rdt21 Discussion
Senderbull seq added to pktbull two seq rsquos (01) will sufficebull must check if received ACKNAK corrupted bull twice as many states
ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq
Receiverbull must check if received packet is duplicate
ndash state indicates whether 0 or 1 is expected pkt seq
bull note receiver can not know if its last ACKNAK received OK at sender
Rdt22 a NAK-free Protocol
bull Reduce type of response ACK onlybull Same functionality as rdt21 using ACKs onlybull Instead of NAK receiver sends ACK for last
pkt received OKndash receiver must explicitly include seq of pkt being
ACKed
bull Duplicate ACK at sender results in same action as NAK retransmit current pkt
rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))
udt_send(sndpkt)
Rdt22 Sender amp Receiver Fragments
Wait for call 0 from
above
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)
rdt_send(data)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
Wait for ACK 0
sender FSMfragment
Wait for 0 from below
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)
receiver FSMfragment
L
Rdt30 Channels with Errors and Loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of
help but not enough
How to determine if a packet is lost
Rdt30 Channels with Errors and Loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of
help but not enough
Approach sender waits ldquoreasonablerdquo amount of time for ACK
bull Retransmits if no ACK received in this time
bull If pkt (or ACK) just delayed (not lost)ndash Retransmission will be
duplicate but use of seq rsquos already handles this
ndash Receiver must specify seq of pkt being ACKed
bull Requires countdown timer
Rdt30 Sender
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer
rdt_send(data)
Wait for
ACK0
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )
Wait for call 1 from
above
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer
rdt_send(data)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)
stop_timerstop_timer
udt_send(sndpkt)start_timer
timeout
udt_send(sndpkt)start_timer
timeout
rdt_rcv(rcvpkt)
Wait for call 0from
above
Wait for
ACK1
Lrdt_rcv(rcvpkt)
LL
L
Rdt30 in Action
Rdt30 in Action
Performance of Rdt30
bull Rdt30 works but performance stinkshellipbull ex 1 Gbps link 15 ms prop delay 8000 bit
packet
U sender utilization ndash fraction of time sender busy sending
U sender
= 008
30008 = 000027
microseconds
L R
RTT + L R =
bull 1KB pkt every 30 msec -gt 33kBsec throughput over 1 Gbps link
bull Network protocol limits use of physical resources
dsmicrosecon8bps10
bits80009
R
Ldtrans
(Picture next slide)
Rdt30 Stop-and-Wait Operation
first packet bit transmitted t = 0
sender
receiver
RTT
last packet bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
U sender
= 008
30008 = 000027
microseconds
L R
RTT + L R =
Pipelined Protocols
Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash Range of sequence numbers must be increasedndash Need buffering at sender andor receiver
Pipelining Increased Utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
U sender
= 024
30008 = 00008
microseconds
3 L R
RTT + L R =
Increase utilizationby a factor of 3
bull Two generic forms of pipelined protocols go-Back-N selective repeat
Pipelining Protocols
Go-back-N overviewbull sender up to N
unACKed pkts in pipeline
bull receiver only sends cumulative ACKsndash doesnrsquot ACK pkt if
therersquos a gap
bull sender has timer for oldest unACKed pktndash if timer expires
retransmit all unACKed packets
Selective Repeat overviewbull sender up to N
unACKed packets in pipeline
bull receiver ACKs individual pkts
bull sender maintains timer for each unACKed pktndash if timer expires
retransmit only unACKed packet
Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unACKed pkts allowed
bull ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)
bull Timer for each in-flight pktbull Timeout(n) retransmit pkt n and all higher seq pkts in
window
GBN Sender Extended FSM
Waitstart_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])
timeout
rdt_send(data)
if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)
base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
base=1nextseqnum=1
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
L
GBN Receiver Extended FSM
ACK-only always send ACK for correctly-received pkt with highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum
bull out-of-order pkt ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK pkt with highest in-order seq
Wait
udt_send(sndpkt)
default
rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++
expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)
L
GBN inaction
GBN Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsgo-back-nindexhtml
Selective Repeatbull Receiver individually acknowledges all
correctly received pktsndash Buffers pkts as needed for eventual in-order
delivery to upper layer
bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt
bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts
Selective Repeat sender receiver windows
Selective Repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next
unACKed seq
senderpkt n in [rcvbase rcvbase+N-
1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also
deliver buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1]
bull ACK(n)
otherwise bull ignore
receiver
Transport Layer 3-52
Selective Repeat in Action
Selective Repeat
DilemmaExample bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
SR Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Overview RFCs 793 1122 1323 2018 2581
bull full duplex datandash bi-directional data flow
in same connectionndash MSS maximum
segment size
bull connection-oriented ndash handshaking
(exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not
overwhelm receiver
bull point-to-pointndash one sender one
receiver
bull reliable in-order byte steamndash no ldquomessage
boundariesrdquo
bull pipelinedndash TCP congestion and
flow control set window size
bull send amp receive buffers
socketdoor
T C Psend buffer
T C Preceive buffer
socketdoor
segm ent
applicationwrites data
applicationreads data
TCP Segment Structure
source port dest port
32 bits
applicationdata (variable length)
sequence number
acknowledgement number
Receive window
Urg data pointerchecksumFSRPAU
headlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
TCP Seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next
byte expected from other side
ndash cumulative ACKQ how receiver
handles out-of-order segmentsndash A TCP spec
doesnrsquot say up to implementer
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
UsertypeslsquoCrsquo
host ACKsreceipt of echoedlsquoCrsquo
timesimple telnet scenario
host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTT
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTTbull SampleRTT measured time
from segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
bull Exponential weighted moving averagebull influence of past sample decreases exponentially
fastbull typical value = 18th (or 0125)
Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety margin
bull First estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP reliable data transfer
bull TCP creates rdt service on top of IPrsquos unreliable service
bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs
bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control
TCP Sender EventsData rcvd from appbull Create segment with
seq bull seq is byte-stream
number of first data byte in segment
bull Start timer if not already running (think of timer as for oldest unACKed segment)
bull Expiration interval TimeOutInterval
Timeoutbull retransmit segment
that caused timeoutbull restart timer ACK rcvdbull If acknowledges
previously unACKed segmentsndash update what is known
to be ACKedndash start timer if there are
outstanding segments
TCP Sender
(simplified)
NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71
y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed
TCP Retransmission Scenarios
Host A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq
=92
tim
eout
ACK=120
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq
=92
tim
eout
SendBase= 100
SendBase= 120
SendBase= 120
Sendbase= 100
TCP Retransmission Scenarios (more)
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
Cumulative ACK scenario
Host B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
Fast Retransmit
bull Time-out period often relatively longndash Long delay before
resending lost packet
bull Detect lost segments via duplicate ACKsndash Sender often sends
many segments back-to-back
ndash If segment lost there will likely be many duplicate ACKs for that segment
bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
Host A
timeo
ut
Host B
time
X
resend seq X2
seq x1seq x2seq x3seq x4seq x5
ACK x1
ACK x1ACK x1ACK x1
tripleduplicate
ACKs
Fast
Retra
nsm
it
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Flow Control
bull Receive side of TCP connection has a receive buffer
bull speed-matching service matching send rate to receiving applicationrsquos drain rate
bull App process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast
flow control
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
TCP Flow Control How it Works
(suppose TCP receiver discards out-of-order segments)
bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Receiver advertises unused buffer space by including rwnd value in segment header
bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos
buffer doesnrsquot overflow
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
rwndRcvBuffer
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control
info (eg RcvWindow)bull client connection
initiator Socket clientSocket = new
Socket(ldquohostnamerdquo port) bull server contacted by
client Socket connectionSocket =
welcomeSocketaccept()
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial
seq Step 3 client receives
SYNACK replies with ACK segment which may contain data
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer 3-81
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Principles of Congestion Control
Congestionbull Informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull Different from flow controlbull Manifestations
ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)
bull A ldquotop-10rdquo problem
Causescosts of Congestion Scenario 1
bull Two senders two receivers
bull One router infinite buffers
bull No retransmission
bull Large delays when congested
bull Maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
Causescosts of Congestion Scenario 2
bull One router finite buffers bull Sender retransmission of lost packet
finite shared output link
buffers
Host A
lin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of congestion Scenario 2
bull Always (goodput)
bull ldquoPerfectrdquo retransmission only when loss
bull Retransmission of delayed (not lost) packet makes
larger (than perfect case) for same
lin
lout
=
lin
lout
gtl
inl
out
ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of
pkt
R2
R2lin
l ou
t
b
R2
R2lin
l ou
t
a
R2
R2lin
l ou
t
c
R4
R3
Causescosts of Congestion Scenario 3
bull Four sendersbull Multihop pathsbull Timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of Congestion Scenario 3
Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Host A
Host B
lo
u
t
Approaches towards congestion control
End-end congestion control
bull No explicit feedback from network
bull Congestion inferred from end-system observed loss delay
bull Approach taken by TCP
Network-assisted congestion control
bull Routers provide feedback to end systemsndash Single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash Explicit rate sender should send at
Broadly
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Congestion Control
bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level
bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)
network not congested so increase sending rate
- lost segment assume loss due to congested network so decrease sending rate
TCP Congestion Control Bandwidth Probing
bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since
available bandwidth is changing depending on other connections in network)
ACKs being received so increase rate
X
X
XX
X loss so decrease rate
send
ing
rate
time
bull Q how fast to increasedecrease- details to follow
TCPrsquosldquosawtoothrdquobehavior
TCP Congestion Control detailsbull sender limits rate by limiting
number of unACKed bytes ldquoin pipelinerdquo
ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)
bull roughly
bull cwnd is dynamic function of perceived network congestion
rate = cwnd
RTT bytessec
LastByteSent-LastByteAcked cwnd
cwndbytes
RTT
ACK(s)
TCP Congestion Control more details
segment loss event reducing cwnd
bull timeout no response from receiverndash cut cwnd to 1
bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less
aggressively than on timeout
ACK received increase cwnd
bull slowstart phase - increase exponentially
fast (despite name) at connection start or following timeout
bull congestion avoidance - increase linearly
TCP Slow Start
bull when connection begins cwnd = 1 MSSndash example MSS = 500
bytes amp RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp
up to respectable ratebull increase rate exponentially
until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transitioning intoout of slowstart
ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2
ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to
congestion avoidance phase
slow start timeout
ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed
new ACKdupACKcount++
duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion
avoidance
TCP Congestion Avoidance
bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1
MSS per RTT ndash approach possible
congestion slower than in slowstart
ndash implementation cwnd = cwnd + MSScwnd for each ACK received
bull ACKs increase cwnd by 1 MSS per RTT additive increase
bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease
AIMD
AIMD Additive IncreaseMultiplicative Decrease
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
Reliable Data Transfer Getting Started
Wersquollbull Incrementally develop sender receiver
sides of reliable data transfer protocol (rdt)bull Consider only unidirectional data transfer
ndash but control info will flow on both directions
bull Use finite state machines (FSM) to specify sender receiver
state 1 state 2
event causing state transitionactions taken on state transition
state when in this ldquostaterdquo next state uniquely determined by next
event
eventactions
Rdt10 Reliable Transfer over a Reliable Channel
bull Underlying channel perfectly reliablendash no bit errorsndash no loss of packets
bull Separate FSMs for sender receiverndash sender sends data into underlying channelndash receiver read data from underlying channel
Wait for call from above packet = make_pkt(data)
udt_send(packet)
rdt_send(data)
extract (packetdata)deliver_data(data)
Wait for call from
below
rdt_rcv(packet)
sender receiverEasy
What if Taking a Message over Phone
bull Message is clearbull Message is garbled
What if Taking a Message over Phone
bull Message is clearndash Ok
bull Message is garbledndash Ask to repeatndash May not need whole message
bull In networks called Automatic Repeat reQuest (ARQ)ndash Need error detectionndash Receiver feedbackndash Retransmission
Rdt20 Channel with Bit Errors (no Loss)
bull Underlying channel may flip bits in packetndash Checksum to detect bit errors
bull The question how to recover from errorsndash acknowledgements (ACKs) receiver explicitly tells
sender that pkt received OKndash negative acknowledgements (NAKs) receiver
explicitly tells sender that pkt had errorsndash Sender retransmits pkt on receipt of NAK
bull New mechanisms in rdt20 (beyond rdt10)ndash Error detectionndash Receiver feedback control msgs (ACKNAK)
rcvrsender
Rdt20 FSM Specification
Wait for call from above
sndpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from below
sender
receiverrdt_send(data)
L
Rdt20 Operation with No Errors
Wait for call from above
snkpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from
below
rdt_send(data)
L
sender
receiver
Rdt20 Error Scenario
Wait for call from above
snkpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from
below
rdt_send(data)
L
sender
receiver
Sender sends one packet then waits for receiver response
stop and wait
Rdt20 Has a Fatal Flaw
Wait for call from above
sndpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from below
sender
receiverrdt_send(data)
L
Rdt20 Has a Fatal Flaw
What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicatebull How to handle duplicates
Rdt20 Has a Fatal Flaw
What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicate
Handling duplicates bull Sender retransmits
current pkt if ACKNAK garbled
bull Sender adds sequence number to each pktndash Can use 1 bit (for now)
bull receiver discards (doesnrsquot deliver up) duplicate pkt
Rdt21 Sender Handles Garbled ACKNAKs
Wait for call 0 from
above
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)
rdt_send(data)
Wait for ACK or NAK 0
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) ||isNAK(rcvpkt) )
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)
rdt_send(data)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)
Wait for call 1 from
above
Wait for ACK or NAK 1
LL
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)
Rdt21 Receiver Handles Garbled ACKNAKs
Wait for 0 from below
sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq0(rcvpkt)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
ampamp has_seq1(rcvpkt) extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
Wait for 1 from below
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq0(rcvpkt)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq1(rcvpkt)
sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
Rdt21 Discussion
Senderbull seq added to pktbull two seq rsquos (01) will sufficebull must check if received ACKNAK corrupted bull twice as many states
ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq
Receiverbull must check if received packet is duplicate
ndash state indicates whether 0 or 1 is expected pkt seq
bull note receiver can not know if its last ACKNAK received OK at sender
Rdt22 a NAK-free Protocol
bull Reduce type of response ACK onlybull Same functionality as rdt21 using ACKs onlybull Instead of NAK receiver sends ACK for last
pkt received OKndash receiver must explicitly include seq of pkt being
ACKed
bull Duplicate ACK at sender results in same action as NAK retransmit current pkt
rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))
udt_send(sndpkt)
Rdt22 Sender amp Receiver Fragments
Wait for call 0 from
above
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)
rdt_send(data)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
Wait for ACK 0
sender FSMfragment
Wait for 0 from below
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)
receiver FSMfragment
L
Rdt30 Channels with Errors and Loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of
help but not enough
How to determine if a packet is lost
Rdt30 Channels with Errors and Loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of
help but not enough
Approach sender waits ldquoreasonablerdquo amount of time for ACK
bull Retransmits if no ACK received in this time
bull If pkt (or ACK) just delayed (not lost)ndash Retransmission will be
duplicate but use of seq rsquos already handles this
ndash Receiver must specify seq of pkt being ACKed
bull Requires countdown timer
Rdt30 Sender
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer
rdt_send(data)
Wait for
ACK0
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )
Wait for call 1 from
above
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer
rdt_send(data)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)
stop_timerstop_timer
udt_send(sndpkt)start_timer
timeout
udt_send(sndpkt)start_timer
timeout
rdt_rcv(rcvpkt)
Wait for call 0from
above
Wait for
ACK1
Lrdt_rcv(rcvpkt)
LL
L
Rdt30 in Action
Rdt30 in Action
Performance of Rdt30
bull Rdt30 works but performance stinkshellipbull ex 1 Gbps link 15 ms prop delay 8000 bit
packet
U sender utilization ndash fraction of time sender busy sending
U sender
= 008
30008 = 000027
microseconds
L R
RTT + L R =
bull 1KB pkt every 30 msec -gt 33kBsec throughput over 1 Gbps link
bull Network protocol limits use of physical resources
dsmicrosecon8bps10
bits80009
R
Ldtrans
(Picture next slide)
Rdt30 Stop-and-Wait Operation
first packet bit transmitted t = 0
sender
receiver
RTT
last packet bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
U sender
= 008
30008 = 000027
microseconds
L R
RTT + L R =
Pipelined Protocols
Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash Range of sequence numbers must be increasedndash Need buffering at sender andor receiver
Pipelining Increased Utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
U sender
= 024
30008 = 00008
microseconds
3 L R
RTT + L R =
Increase utilizationby a factor of 3
bull Two generic forms of pipelined protocols go-Back-N selective repeat
Pipelining Protocols
Go-back-N overviewbull sender up to N
unACKed pkts in pipeline
bull receiver only sends cumulative ACKsndash doesnrsquot ACK pkt if
therersquos a gap
bull sender has timer for oldest unACKed pktndash if timer expires
retransmit all unACKed packets
Selective Repeat overviewbull sender up to N
unACKed packets in pipeline
bull receiver ACKs individual pkts
bull sender maintains timer for each unACKed pktndash if timer expires
retransmit only unACKed packet
Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unACKed pkts allowed
bull ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)
bull Timer for each in-flight pktbull Timeout(n) retransmit pkt n and all higher seq pkts in
window
GBN Sender Extended FSM
Waitstart_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])
timeout
rdt_send(data)
if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)
base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
base=1nextseqnum=1
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
L
GBN Receiver Extended FSM
ACK-only always send ACK for correctly-received pkt with highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum
bull out-of-order pkt ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK pkt with highest in-order seq
Wait
udt_send(sndpkt)
default
rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++
expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)
L
GBN inaction
GBN Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsgo-back-nindexhtml
Selective Repeatbull Receiver individually acknowledges all
correctly received pktsndash Buffers pkts as needed for eventual in-order
delivery to upper layer
bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt
bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts
Selective Repeat sender receiver windows
Selective Repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next
unACKed seq
senderpkt n in [rcvbase rcvbase+N-
1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also
deliver buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1]
bull ACK(n)
otherwise bull ignore
receiver
Transport Layer 3-52
Selective Repeat in Action
Selective Repeat
DilemmaExample bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
SR Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Overview RFCs 793 1122 1323 2018 2581
bull full duplex datandash bi-directional data flow
in same connectionndash MSS maximum
segment size
bull connection-oriented ndash handshaking
(exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not
overwhelm receiver
bull point-to-pointndash one sender one
receiver
bull reliable in-order byte steamndash no ldquomessage
boundariesrdquo
bull pipelinedndash TCP congestion and
flow control set window size
bull send amp receive buffers
socketdoor
T C Psend buffer
T C Preceive buffer
socketdoor
segm ent
applicationwrites data
applicationreads data
TCP Segment Structure
source port dest port
32 bits
applicationdata (variable length)
sequence number
acknowledgement number
Receive window
Urg data pointerchecksumFSRPAU
headlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
TCP Seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next
byte expected from other side
ndash cumulative ACKQ how receiver
handles out-of-order segmentsndash A TCP spec
doesnrsquot say up to implementer
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
UsertypeslsquoCrsquo
host ACKsreceipt of echoedlsquoCrsquo
timesimple telnet scenario
host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTT
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTTbull SampleRTT measured time
from segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
bull Exponential weighted moving averagebull influence of past sample decreases exponentially
fastbull typical value = 18th (or 0125)
Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety margin
bull First estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP reliable data transfer
bull TCP creates rdt service on top of IPrsquos unreliable service
bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs
bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control
TCP Sender EventsData rcvd from appbull Create segment with
seq bull seq is byte-stream
number of first data byte in segment
bull Start timer if not already running (think of timer as for oldest unACKed segment)
bull Expiration interval TimeOutInterval
Timeoutbull retransmit segment
that caused timeoutbull restart timer ACK rcvdbull If acknowledges
previously unACKed segmentsndash update what is known
to be ACKedndash start timer if there are
outstanding segments
TCP Sender
(simplified)
NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71
y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed
TCP Retransmission Scenarios
Host A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq
=92
tim
eout
ACK=120
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq
=92
tim
eout
SendBase= 100
SendBase= 120
SendBase= 120
Sendbase= 100
TCP Retransmission Scenarios (more)
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
Cumulative ACK scenario
Host B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
Fast Retransmit
bull Time-out period often relatively longndash Long delay before
resending lost packet
bull Detect lost segments via duplicate ACKsndash Sender often sends
many segments back-to-back
ndash If segment lost there will likely be many duplicate ACKs for that segment
bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
Host A
timeo
ut
Host B
time
X
resend seq X2
seq x1seq x2seq x3seq x4seq x5
ACK x1
ACK x1ACK x1ACK x1
tripleduplicate
ACKs
Fast
Retra
nsm
it
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Flow Control
bull Receive side of TCP connection has a receive buffer
bull speed-matching service matching send rate to receiving applicationrsquos drain rate
bull App process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast
flow control
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
TCP Flow Control How it Works
(suppose TCP receiver discards out-of-order segments)
bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Receiver advertises unused buffer space by including rwnd value in segment header
bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos
buffer doesnrsquot overflow
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
rwndRcvBuffer
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control
info (eg RcvWindow)bull client connection
initiator Socket clientSocket = new
Socket(ldquohostnamerdquo port) bull server contacted by
client Socket connectionSocket =
welcomeSocketaccept()
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial
seq Step 3 client receives
SYNACK replies with ACK segment which may contain data
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer 3-81
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Principles of Congestion Control
Congestionbull Informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull Different from flow controlbull Manifestations
ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)
bull A ldquotop-10rdquo problem
Causescosts of Congestion Scenario 1
bull Two senders two receivers
bull One router infinite buffers
bull No retransmission
bull Large delays when congested
bull Maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
Causescosts of Congestion Scenario 2
bull One router finite buffers bull Sender retransmission of lost packet
finite shared output link
buffers
Host A
lin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of congestion Scenario 2
bull Always (goodput)
bull ldquoPerfectrdquo retransmission only when loss
bull Retransmission of delayed (not lost) packet makes
larger (than perfect case) for same
lin
lout
=
lin
lout
gtl
inl
out
ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of
pkt
R2
R2lin
l ou
t
b
R2
R2lin
l ou
t
a
R2
R2lin
l ou
t
c
R4
R3
Causescosts of Congestion Scenario 3
bull Four sendersbull Multihop pathsbull Timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of Congestion Scenario 3
Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Host A
Host B
lo
u
t
Approaches towards congestion control
End-end congestion control
bull No explicit feedback from network
bull Congestion inferred from end-system observed loss delay
bull Approach taken by TCP
Network-assisted congestion control
bull Routers provide feedback to end systemsndash Single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash Explicit rate sender should send at
Broadly
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Congestion Control
bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level
bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)
network not congested so increase sending rate
- lost segment assume loss due to congested network so decrease sending rate
TCP Congestion Control Bandwidth Probing
bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since
available bandwidth is changing depending on other connections in network)
ACKs being received so increase rate
X
X
XX
X loss so decrease rate
send
ing
rate
time
bull Q how fast to increasedecrease- details to follow
TCPrsquosldquosawtoothrdquobehavior
TCP Congestion Control detailsbull sender limits rate by limiting
number of unACKed bytes ldquoin pipelinerdquo
ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)
bull roughly
bull cwnd is dynamic function of perceived network congestion
rate = cwnd
RTT bytessec
LastByteSent-LastByteAcked cwnd
cwndbytes
RTT
ACK(s)
TCP Congestion Control more details
segment loss event reducing cwnd
bull timeout no response from receiverndash cut cwnd to 1
bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less
aggressively than on timeout
ACK received increase cwnd
bull slowstart phase - increase exponentially
fast (despite name) at connection start or following timeout
bull congestion avoidance - increase linearly
TCP Slow Start
bull when connection begins cwnd = 1 MSSndash example MSS = 500
bytes amp RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp
up to respectable ratebull increase rate exponentially
until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transitioning intoout of slowstart
ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2
ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to
congestion avoidance phase
slow start timeout
ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed
new ACKdupACKcount++
duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion
avoidance
TCP Congestion Avoidance
bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1
MSS per RTT ndash approach possible
congestion slower than in slowstart
ndash implementation cwnd = cwnd + MSScwnd for each ACK received
bull ACKs increase cwnd by 1 MSS per RTT additive increase
bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease
AIMD
AIMD Additive IncreaseMultiplicative Decrease
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
Rdt10 Reliable Transfer over a Reliable Channel
bull Underlying channel perfectly reliablendash no bit errorsndash no loss of packets
bull Separate FSMs for sender receiverndash sender sends data into underlying channelndash receiver read data from underlying channel
Wait for call from above packet = make_pkt(data)
udt_send(packet)
rdt_send(data)
extract (packetdata)deliver_data(data)
Wait for call from
below
rdt_rcv(packet)
sender receiverEasy
What if Taking a Message over Phone
bull Message is clearbull Message is garbled
What if Taking a Message over Phone
bull Message is clearndash Ok
bull Message is garbledndash Ask to repeatndash May not need whole message
bull In networks called Automatic Repeat reQuest (ARQ)ndash Need error detectionndash Receiver feedbackndash Retransmission
Rdt20 Channel with Bit Errors (no Loss)
bull Underlying channel may flip bits in packetndash Checksum to detect bit errors
bull The question how to recover from errorsndash acknowledgements (ACKs) receiver explicitly tells
sender that pkt received OKndash negative acknowledgements (NAKs) receiver
explicitly tells sender that pkt had errorsndash Sender retransmits pkt on receipt of NAK
bull New mechanisms in rdt20 (beyond rdt10)ndash Error detectionndash Receiver feedback control msgs (ACKNAK)
rcvrsender
Rdt20 FSM Specification
Wait for call from above
sndpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from below
sender
receiverrdt_send(data)
L
Rdt20 Operation with No Errors
Wait for call from above
snkpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from
below
rdt_send(data)
L
sender
receiver
Rdt20 Error Scenario
Wait for call from above
snkpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from
below
rdt_send(data)
L
sender
receiver
Sender sends one packet then waits for receiver response
stop and wait
Rdt20 Has a Fatal Flaw
Wait for call from above
sndpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from below
sender
receiverrdt_send(data)
L
Rdt20 Has a Fatal Flaw
What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicatebull How to handle duplicates
Rdt20 Has a Fatal Flaw
What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicate
Handling duplicates bull Sender retransmits
current pkt if ACKNAK garbled
bull Sender adds sequence number to each pktndash Can use 1 bit (for now)
bull receiver discards (doesnrsquot deliver up) duplicate pkt
Rdt21 Sender Handles Garbled ACKNAKs
Wait for call 0 from
above
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)
rdt_send(data)
Wait for ACK or NAK 0
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) ||isNAK(rcvpkt) )
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)
rdt_send(data)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)
Wait for call 1 from
above
Wait for ACK or NAK 1
LL
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)
Rdt21 Receiver Handles Garbled ACKNAKs
Wait for 0 from below
sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq0(rcvpkt)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
ampamp has_seq1(rcvpkt) extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
Wait for 1 from below
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq0(rcvpkt)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq1(rcvpkt)
sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
Rdt21 Discussion
Senderbull seq added to pktbull two seq rsquos (01) will sufficebull must check if received ACKNAK corrupted bull twice as many states
ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq
Receiverbull must check if received packet is duplicate
ndash state indicates whether 0 or 1 is expected pkt seq
bull note receiver can not know if its last ACKNAK received OK at sender
Rdt22 a NAK-free Protocol
bull Reduce type of response ACK onlybull Same functionality as rdt21 using ACKs onlybull Instead of NAK receiver sends ACK for last
pkt received OKndash receiver must explicitly include seq of pkt being
ACKed
bull Duplicate ACK at sender results in same action as NAK retransmit current pkt
rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))
udt_send(sndpkt)
Rdt22 Sender amp Receiver Fragments
Wait for call 0 from
above
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)
rdt_send(data)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
Wait for ACK 0
sender FSMfragment
Wait for 0 from below
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)
receiver FSMfragment
L
Rdt30 Channels with Errors and Loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of
help but not enough
How to determine if a packet is lost
Rdt30 Channels with Errors and Loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of
help but not enough
Approach sender waits ldquoreasonablerdquo amount of time for ACK
bull Retransmits if no ACK received in this time
bull If pkt (or ACK) just delayed (not lost)ndash Retransmission will be
duplicate but use of seq rsquos already handles this
ndash Receiver must specify seq of pkt being ACKed
bull Requires countdown timer
Rdt30 Sender
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer
rdt_send(data)
Wait for
ACK0
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )
Wait for call 1 from
above
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer
rdt_send(data)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)
stop_timerstop_timer
udt_send(sndpkt)start_timer
timeout
udt_send(sndpkt)start_timer
timeout
rdt_rcv(rcvpkt)
Wait for call 0from
above
Wait for
ACK1
Lrdt_rcv(rcvpkt)
LL
L
Rdt30 in Action
Rdt30 in Action
Performance of Rdt30
bull Rdt30 works but performance stinkshellipbull ex 1 Gbps link 15 ms prop delay 8000 bit
packet
U sender utilization ndash fraction of time sender busy sending
U sender
= 008
30008 = 000027
microseconds
L R
RTT + L R =
bull 1KB pkt every 30 msec -gt 33kBsec throughput over 1 Gbps link
bull Network protocol limits use of physical resources
dsmicrosecon8bps10
bits80009
R
Ldtrans
(Picture next slide)
Rdt30 Stop-and-Wait Operation
first packet bit transmitted t = 0
sender
receiver
RTT
last packet bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
U sender
= 008
30008 = 000027
microseconds
L R
RTT + L R =
Pipelined Protocols
Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash Range of sequence numbers must be increasedndash Need buffering at sender andor receiver
Pipelining Increased Utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
U sender
= 024
30008 = 00008
microseconds
3 L R
RTT + L R =
Increase utilizationby a factor of 3
bull Two generic forms of pipelined protocols go-Back-N selective repeat
Pipelining Protocols
Go-back-N overviewbull sender up to N
unACKed pkts in pipeline
bull receiver only sends cumulative ACKsndash doesnrsquot ACK pkt if
therersquos a gap
bull sender has timer for oldest unACKed pktndash if timer expires
retransmit all unACKed packets
Selective Repeat overviewbull sender up to N
unACKed packets in pipeline
bull receiver ACKs individual pkts
bull sender maintains timer for each unACKed pktndash if timer expires
retransmit only unACKed packet
Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unACKed pkts allowed
bull ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)
bull Timer for each in-flight pktbull Timeout(n) retransmit pkt n and all higher seq pkts in
window
GBN Sender Extended FSM
Waitstart_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])
timeout
rdt_send(data)
if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)
base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
base=1nextseqnum=1
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
L
GBN Receiver Extended FSM
ACK-only always send ACK for correctly-received pkt with highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum
bull out-of-order pkt ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK pkt with highest in-order seq
Wait
udt_send(sndpkt)
default
rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++
expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)
L
GBN inaction
GBN Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsgo-back-nindexhtml
Selective Repeatbull Receiver individually acknowledges all
correctly received pktsndash Buffers pkts as needed for eventual in-order
delivery to upper layer
bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt
bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts
Selective Repeat sender receiver windows
Selective Repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next
unACKed seq
senderpkt n in [rcvbase rcvbase+N-
1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also
deliver buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1]
bull ACK(n)
otherwise bull ignore
receiver
Transport Layer 3-52
Selective Repeat in Action
Selective Repeat
DilemmaExample bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
SR Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Overview RFCs 793 1122 1323 2018 2581
bull full duplex datandash bi-directional data flow
in same connectionndash MSS maximum
segment size
bull connection-oriented ndash handshaking
(exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not
overwhelm receiver
bull point-to-pointndash one sender one
receiver
bull reliable in-order byte steamndash no ldquomessage
boundariesrdquo
bull pipelinedndash TCP congestion and
flow control set window size
bull send amp receive buffers
socketdoor
T C Psend buffer
T C Preceive buffer
socketdoor
segm ent
applicationwrites data
applicationreads data
TCP Segment Structure
source port dest port
32 bits
applicationdata (variable length)
sequence number
acknowledgement number
Receive window
Urg data pointerchecksumFSRPAU
headlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
TCP Seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next
byte expected from other side
ndash cumulative ACKQ how receiver
handles out-of-order segmentsndash A TCP spec
doesnrsquot say up to implementer
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
UsertypeslsquoCrsquo
host ACKsreceipt of echoedlsquoCrsquo
timesimple telnet scenario
host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTT
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTTbull SampleRTT measured time
from segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
bull Exponential weighted moving averagebull influence of past sample decreases exponentially
fastbull typical value = 18th (or 0125)
Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety margin
bull First estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP reliable data transfer
bull TCP creates rdt service on top of IPrsquos unreliable service
bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs
bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control
TCP Sender EventsData rcvd from appbull Create segment with
seq bull seq is byte-stream
number of first data byte in segment
bull Start timer if not already running (think of timer as for oldest unACKed segment)
bull Expiration interval TimeOutInterval
Timeoutbull retransmit segment
that caused timeoutbull restart timer ACK rcvdbull If acknowledges
previously unACKed segmentsndash update what is known
to be ACKedndash start timer if there are
outstanding segments
TCP Sender
(simplified)
NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71
y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed
TCP Retransmission Scenarios
Host A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq
=92
tim
eout
ACK=120
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq
=92
tim
eout
SendBase= 100
SendBase= 120
SendBase= 120
Sendbase= 100
TCP Retransmission Scenarios (more)
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
Cumulative ACK scenario
Host B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
Fast Retransmit
bull Time-out period often relatively longndash Long delay before
resending lost packet
bull Detect lost segments via duplicate ACKsndash Sender often sends
many segments back-to-back
ndash If segment lost there will likely be many duplicate ACKs for that segment
bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
Host A
timeo
ut
Host B
time
X
resend seq X2
seq x1seq x2seq x3seq x4seq x5
ACK x1
ACK x1ACK x1ACK x1
tripleduplicate
ACKs
Fast
Retra
nsm
it
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Flow Control
bull Receive side of TCP connection has a receive buffer
bull speed-matching service matching send rate to receiving applicationrsquos drain rate
bull App process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast
flow control
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
TCP Flow Control How it Works
(suppose TCP receiver discards out-of-order segments)
bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Receiver advertises unused buffer space by including rwnd value in segment header
bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos
buffer doesnrsquot overflow
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
rwndRcvBuffer
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control
info (eg RcvWindow)bull client connection
initiator Socket clientSocket = new
Socket(ldquohostnamerdquo port) bull server contacted by
client Socket connectionSocket =
welcomeSocketaccept()
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial
seq Step 3 client receives
SYNACK replies with ACK segment which may contain data
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer 3-81
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Principles of Congestion Control
Congestionbull Informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull Different from flow controlbull Manifestations
ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)
bull A ldquotop-10rdquo problem
Causescosts of Congestion Scenario 1
bull Two senders two receivers
bull One router infinite buffers
bull No retransmission
bull Large delays when congested
bull Maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
Causescosts of Congestion Scenario 2
bull One router finite buffers bull Sender retransmission of lost packet
finite shared output link
buffers
Host A
lin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of congestion Scenario 2
bull Always (goodput)
bull ldquoPerfectrdquo retransmission only when loss
bull Retransmission of delayed (not lost) packet makes
larger (than perfect case) for same
lin
lout
=
lin
lout
gtl
inl
out
ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of
pkt
R2
R2lin
l ou
t
b
R2
R2lin
l ou
t
a
R2
R2lin
l ou
t
c
R4
R3
Causescosts of Congestion Scenario 3
bull Four sendersbull Multihop pathsbull Timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of Congestion Scenario 3
Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Host A
Host B
lo
u
t
Approaches towards congestion control
End-end congestion control
bull No explicit feedback from network
bull Congestion inferred from end-system observed loss delay
bull Approach taken by TCP
Network-assisted congestion control
bull Routers provide feedback to end systemsndash Single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash Explicit rate sender should send at
Broadly
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Congestion Control
bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level
bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)
network not congested so increase sending rate
- lost segment assume loss due to congested network so decrease sending rate
TCP Congestion Control Bandwidth Probing
bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since
available bandwidth is changing depending on other connections in network)
ACKs being received so increase rate
X
X
XX
X loss so decrease rate
send
ing
rate
time
bull Q how fast to increasedecrease- details to follow
TCPrsquosldquosawtoothrdquobehavior
TCP Congestion Control detailsbull sender limits rate by limiting
number of unACKed bytes ldquoin pipelinerdquo
ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)
bull roughly
bull cwnd is dynamic function of perceived network congestion
rate = cwnd
RTT bytessec
LastByteSent-LastByteAcked cwnd
cwndbytes
RTT
ACK(s)
TCP Congestion Control more details
segment loss event reducing cwnd
bull timeout no response from receiverndash cut cwnd to 1
bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less
aggressively than on timeout
ACK received increase cwnd
bull slowstart phase - increase exponentially
fast (despite name) at connection start or following timeout
bull congestion avoidance - increase linearly
TCP Slow Start
bull when connection begins cwnd = 1 MSSndash example MSS = 500
bytes amp RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp
up to respectable ratebull increase rate exponentially
until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transitioning intoout of slowstart
ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2
ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to
congestion avoidance phase
slow start timeout
ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed
new ACKdupACKcount++
duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion
avoidance
TCP Congestion Avoidance
bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1
MSS per RTT ndash approach possible
congestion slower than in slowstart
ndash implementation cwnd = cwnd + MSScwnd for each ACK received
bull ACKs increase cwnd by 1 MSS per RTT additive increase
bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease
AIMD
AIMD Additive IncreaseMultiplicative Decrease
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
What if Taking a Message over Phone
bull Message is clearbull Message is garbled
What if Taking a Message over Phone
bull Message is clearndash Ok
bull Message is garbledndash Ask to repeatndash May not need whole message
bull In networks called Automatic Repeat reQuest (ARQ)ndash Need error detectionndash Receiver feedbackndash Retransmission
Rdt20 Channel with Bit Errors (no Loss)
bull Underlying channel may flip bits in packetndash Checksum to detect bit errors
bull The question how to recover from errorsndash acknowledgements (ACKs) receiver explicitly tells
sender that pkt received OKndash negative acknowledgements (NAKs) receiver
explicitly tells sender that pkt had errorsndash Sender retransmits pkt on receipt of NAK
bull New mechanisms in rdt20 (beyond rdt10)ndash Error detectionndash Receiver feedback control msgs (ACKNAK)
rcvrsender
Rdt20 FSM Specification
Wait for call from above
sndpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from below
sender
receiverrdt_send(data)
L
Rdt20 Operation with No Errors
Wait for call from above
snkpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from
below
rdt_send(data)
L
sender
receiver
Rdt20 Error Scenario
Wait for call from above
snkpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from
below
rdt_send(data)
L
sender
receiver
Sender sends one packet then waits for receiver response
stop and wait
Rdt20 Has a Fatal Flaw
Wait for call from above
sndpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from below
sender
receiverrdt_send(data)
L
Rdt20 Has a Fatal Flaw
What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicatebull How to handle duplicates
Rdt20 Has a Fatal Flaw
What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicate
Handling duplicates bull Sender retransmits
current pkt if ACKNAK garbled
bull Sender adds sequence number to each pktndash Can use 1 bit (for now)
bull receiver discards (doesnrsquot deliver up) duplicate pkt
Rdt21 Sender Handles Garbled ACKNAKs
Wait for call 0 from
above
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)
rdt_send(data)
Wait for ACK or NAK 0
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) ||isNAK(rcvpkt) )
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)
rdt_send(data)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)
Wait for call 1 from
above
Wait for ACK or NAK 1
LL
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)
Rdt21 Receiver Handles Garbled ACKNAKs
Wait for 0 from below
sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq0(rcvpkt)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
ampamp has_seq1(rcvpkt) extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
Wait for 1 from below
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq0(rcvpkt)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq1(rcvpkt)
sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
Rdt21 Discussion
Senderbull seq added to pktbull two seq rsquos (01) will sufficebull must check if received ACKNAK corrupted bull twice as many states
ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq
Receiverbull must check if received packet is duplicate
ndash state indicates whether 0 or 1 is expected pkt seq
bull note receiver can not know if its last ACKNAK received OK at sender
Rdt22 a NAK-free Protocol
bull Reduce type of response ACK onlybull Same functionality as rdt21 using ACKs onlybull Instead of NAK receiver sends ACK for last
pkt received OKndash receiver must explicitly include seq of pkt being
ACKed
bull Duplicate ACK at sender results in same action as NAK retransmit current pkt
rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))
udt_send(sndpkt)
Rdt22 Sender amp Receiver Fragments
Wait for call 0 from
above
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)
rdt_send(data)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
Wait for ACK 0
sender FSMfragment
Wait for 0 from below
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)
receiver FSMfragment
L
Rdt30 Channels with Errors and Loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of
help but not enough
How to determine if a packet is lost
Rdt30 Channels with Errors and Loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of
help but not enough
Approach sender waits ldquoreasonablerdquo amount of time for ACK
bull Retransmits if no ACK received in this time
bull If pkt (or ACK) just delayed (not lost)ndash Retransmission will be
duplicate but use of seq rsquos already handles this
ndash Receiver must specify seq of pkt being ACKed
bull Requires countdown timer
Rdt30 Sender
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer
rdt_send(data)
Wait for
ACK0
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )
Wait for call 1 from
above
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer
rdt_send(data)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)
stop_timerstop_timer
udt_send(sndpkt)start_timer
timeout
udt_send(sndpkt)start_timer
timeout
rdt_rcv(rcvpkt)
Wait for call 0from
above
Wait for
ACK1
Lrdt_rcv(rcvpkt)
LL
L
Rdt30 in Action
Rdt30 in Action
Performance of Rdt30
bull Rdt30 works but performance stinkshellipbull ex 1 Gbps link 15 ms prop delay 8000 bit
packet
U sender utilization ndash fraction of time sender busy sending
U sender
= 008
30008 = 000027
microseconds
L R
RTT + L R =
bull 1KB pkt every 30 msec -gt 33kBsec throughput over 1 Gbps link
bull Network protocol limits use of physical resources
dsmicrosecon8bps10
bits80009
R
Ldtrans
(Picture next slide)
Rdt30 Stop-and-Wait Operation
first packet bit transmitted t = 0
sender
receiver
RTT
last packet bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
U sender
= 008
30008 = 000027
microseconds
L R
RTT + L R =
Pipelined Protocols
Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash Range of sequence numbers must be increasedndash Need buffering at sender andor receiver
Pipelining Increased Utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
U sender
= 024
30008 = 00008
microseconds
3 L R
RTT + L R =
Increase utilizationby a factor of 3
bull Two generic forms of pipelined protocols go-Back-N selective repeat
Pipelining Protocols
Go-back-N overviewbull sender up to N
unACKed pkts in pipeline
bull receiver only sends cumulative ACKsndash doesnrsquot ACK pkt if
therersquos a gap
bull sender has timer for oldest unACKed pktndash if timer expires
retransmit all unACKed packets
Selective Repeat overviewbull sender up to N
unACKed packets in pipeline
bull receiver ACKs individual pkts
bull sender maintains timer for each unACKed pktndash if timer expires
retransmit only unACKed packet
Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unACKed pkts allowed
bull ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)
bull Timer for each in-flight pktbull Timeout(n) retransmit pkt n and all higher seq pkts in
window
GBN Sender Extended FSM
Waitstart_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])
timeout
rdt_send(data)
if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)
base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
base=1nextseqnum=1
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
L
GBN Receiver Extended FSM
ACK-only always send ACK for correctly-received pkt with highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum
bull out-of-order pkt ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK pkt with highest in-order seq
Wait
udt_send(sndpkt)
default
rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++
expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)
L
GBN inaction
GBN Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsgo-back-nindexhtml
Selective Repeatbull Receiver individually acknowledges all
correctly received pktsndash Buffers pkts as needed for eventual in-order
delivery to upper layer
bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt
bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts
Selective Repeat sender receiver windows
Selective Repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next
unACKed seq
senderpkt n in [rcvbase rcvbase+N-
1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also
deliver buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1]
bull ACK(n)
otherwise bull ignore
receiver
Transport Layer 3-52
Selective Repeat in Action
Selective Repeat
DilemmaExample bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
SR Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Overview RFCs 793 1122 1323 2018 2581
bull full duplex datandash bi-directional data flow
in same connectionndash MSS maximum
segment size
bull connection-oriented ndash handshaking
(exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not
overwhelm receiver
bull point-to-pointndash one sender one
receiver
bull reliable in-order byte steamndash no ldquomessage
boundariesrdquo
bull pipelinedndash TCP congestion and
flow control set window size
bull send amp receive buffers
socketdoor
T C Psend buffer
T C Preceive buffer
socketdoor
segm ent
applicationwrites data
applicationreads data
TCP Segment Structure
source port dest port
32 bits
applicationdata (variable length)
sequence number
acknowledgement number
Receive window
Urg data pointerchecksumFSRPAU
headlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
TCP Seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next
byte expected from other side
ndash cumulative ACKQ how receiver
handles out-of-order segmentsndash A TCP spec
doesnrsquot say up to implementer
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
UsertypeslsquoCrsquo
host ACKsreceipt of echoedlsquoCrsquo
timesimple telnet scenario
host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTT
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTTbull SampleRTT measured time
from segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
bull Exponential weighted moving averagebull influence of past sample decreases exponentially
fastbull typical value = 18th (or 0125)
Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety margin
bull First estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP reliable data transfer
bull TCP creates rdt service on top of IPrsquos unreliable service
bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs
bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control
TCP Sender EventsData rcvd from appbull Create segment with
seq bull seq is byte-stream
number of first data byte in segment
bull Start timer if not already running (think of timer as for oldest unACKed segment)
bull Expiration interval TimeOutInterval
Timeoutbull retransmit segment
that caused timeoutbull restart timer ACK rcvdbull If acknowledges
previously unACKed segmentsndash update what is known
to be ACKedndash start timer if there are
outstanding segments
TCP Sender
(simplified)
NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71
y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed
TCP Retransmission Scenarios
Host A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq
=92
tim
eout
ACK=120
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq
=92
tim
eout
SendBase= 100
SendBase= 120
SendBase= 120
Sendbase= 100
TCP Retransmission Scenarios (more)
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
Cumulative ACK scenario
Host B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
Fast Retransmit
bull Time-out period often relatively longndash Long delay before
resending lost packet
bull Detect lost segments via duplicate ACKsndash Sender often sends
many segments back-to-back
ndash If segment lost there will likely be many duplicate ACKs for that segment
bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
Host A
timeo
ut
Host B
time
X
resend seq X2
seq x1seq x2seq x3seq x4seq x5
ACK x1
ACK x1ACK x1ACK x1
tripleduplicate
ACKs
Fast
Retra
nsm
it
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Flow Control
bull Receive side of TCP connection has a receive buffer
bull speed-matching service matching send rate to receiving applicationrsquos drain rate
bull App process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast
flow control
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
TCP Flow Control How it Works
(suppose TCP receiver discards out-of-order segments)
bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Receiver advertises unused buffer space by including rwnd value in segment header
bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos
buffer doesnrsquot overflow
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
rwndRcvBuffer
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control
info (eg RcvWindow)bull client connection
initiator Socket clientSocket = new
Socket(ldquohostnamerdquo port) bull server contacted by
client Socket connectionSocket =
welcomeSocketaccept()
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial
seq Step 3 client receives
SYNACK replies with ACK segment which may contain data
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer 3-81
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Principles of Congestion Control
Congestionbull Informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull Different from flow controlbull Manifestations
ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)
bull A ldquotop-10rdquo problem
Causescosts of Congestion Scenario 1
bull Two senders two receivers
bull One router infinite buffers
bull No retransmission
bull Large delays when congested
bull Maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
Causescosts of Congestion Scenario 2
bull One router finite buffers bull Sender retransmission of lost packet
finite shared output link
buffers
Host A
lin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of congestion Scenario 2
bull Always (goodput)
bull ldquoPerfectrdquo retransmission only when loss
bull Retransmission of delayed (not lost) packet makes
larger (than perfect case) for same
lin
lout
=
lin
lout
gtl
inl
out
ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of
pkt
R2
R2lin
l ou
t
b
R2
R2lin
l ou
t
a
R2
R2lin
l ou
t
c
R4
R3
Causescosts of Congestion Scenario 3
bull Four sendersbull Multihop pathsbull Timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of Congestion Scenario 3
Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Host A
Host B
lo
u
t
Approaches towards congestion control
End-end congestion control
bull No explicit feedback from network
bull Congestion inferred from end-system observed loss delay
bull Approach taken by TCP
Network-assisted congestion control
bull Routers provide feedback to end systemsndash Single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash Explicit rate sender should send at
Broadly
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Congestion Control
bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level
bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)
network not congested so increase sending rate
- lost segment assume loss due to congested network so decrease sending rate
TCP Congestion Control Bandwidth Probing
bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since
available bandwidth is changing depending on other connections in network)
ACKs being received so increase rate
X
X
XX
X loss so decrease rate
send
ing
rate
time
bull Q how fast to increasedecrease- details to follow
TCPrsquosldquosawtoothrdquobehavior
TCP Congestion Control detailsbull sender limits rate by limiting
number of unACKed bytes ldquoin pipelinerdquo
ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)
bull roughly
bull cwnd is dynamic function of perceived network congestion
rate = cwnd
RTT bytessec
LastByteSent-LastByteAcked cwnd
cwndbytes
RTT
ACK(s)
TCP Congestion Control more details
segment loss event reducing cwnd
bull timeout no response from receiverndash cut cwnd to 1
bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less
aggressively than on timeout
ACK received increase cwnd
bull slowstart phase - increase exponentially
fast (despite name) at connection start or following timeout
bull congestion avoidance - increase linearly
TCP Slow Start
bull when connection begins cwnd = 1 MSSndash example MSS = 500
bytes amp RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp
up to respectable ratebull increase rate exponentially
until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transitioning intoout of slowstart
ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2
ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to
congestion avoidance phase
slow start timeout
ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed
new ACKdupACKcount++
duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion
avoidance
TCP Congestion Avoidance
bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1
MSS per RTT ndash approach possible
congestion slower than in slowstart
ndash implementation cwnd = cwnd + MSScwnd for each ACK received
bull ACKs increase cwnd by 1 MSS per RTT additive increase
bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease
AIMD
AIMD Additive IncreaseMultiplicative Decrease
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
What if Taking a Message over Phone
bull Message is clearndash Ok
bull Message is garbledndash Ask to repeatndash May not need whole message
bull In networks called Automatic Repeat reQuest (ARQ)ndash Need error detectionndash Receiver feedbackndash Retransmission
Rdt20 Channel with Bit Errors (no Loss)
bull Underlying channel may flip bits in packetndash Checksum to detect bit errors
bull The question how to recover from errorsndash acknowledgements (ACKs) receiver explicitly tells
sender that pkt received OKndash negative acknowledgements (NAKs) receiver
explicitly tells sender that pkt had errorsndash Sender retransmits pkt on receipt of NAK
bull New mechanisms in rdt20 (beyond rdt10)ndash Error detectionndash Receiver feedback control msgs (ACKNAK)
rcvrsender
Rdt20 FSM Specification
Wait for call from above
sndpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from below
sender
receiverrdt_send(data)
L
Rdt20 Operation with No Errors
Wait for call from above
snkpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from
below
rdt_send(data)
L
sender
receiver
Rdt20 Error Scenario
Wait for call from above
snkpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from
below
rdt_send(data)
L
sender
receiver
Sender sends one packet then waits for receiver response
stop and wait
Rdt20 Has a Fatal Flaw
Wait for call from above
sndpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from below
sender
receiverrdt_send(data)
L
Rdt20 Has a Fatal Flaw
What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicatebull How to handle duplicates
Rdt20 Has a Fatal Flaw
What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicate
Handling duplicates bull Sender retransmits
current pkt if ACKNAK garbled
bull Sender adds sequence number to each pktndash Can use 1 bit (for now)
bull receiver discards (doesnrsquot deliver up) duplicate pkt
Rdt21 Sender Handles Garbled ACKNAKs
Wait for call 0 from
above
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)
rdt_send(data)
Wait for ACK or NAK 0
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) ||isNAK(rcvpkt) )
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)
rdt_send(data)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)
Wait for call 1 from
above
Wait for ACK or NAK 1
LL
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)
Rdt21 Receiver Handles Garbled ACKNAKs
Wait for 0 from below
sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq0(rcvpkt)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
ampamp has_seq1(rcvpkt) extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
Wait for 1 from below
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq0(rcvpkt)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq1(rcvpkt)
sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
Rdt21 Discussion
Senderbull seq added to pktbull two seq rsquos (01) will sufficebull must check if received ACKNAK corrupted bull twice as many states
ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq
Receiverbull must check if received packet is duplicate
ndash state indicates whether 0 or 1 is expected pkt seq
bull note receiver can not know if its last ACKNAK received OK at sender
Rdt22 a NAK-free Protocol
bull Reduce type of response ACK onlybull Same functionality as rdt21 using ACKs onlybull Instead of NAK receiver sends ACK for last
pkt received OKndash receiver must explicitly include seq of pkt being
ACKed
bull Duplicate ACK at sender results in same action as NAK retransmit current pkt
rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))
udt_send(sndpkt)
Rdt22 Sender amp Receiver Fragments
Wait for call 0 from
above
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)
rdt_send(data)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
Wait for ACK 0
sender FSMfragment
Wait for 0 from below
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)
receiver FSMfragment
L
Rdt30 Channels with Errors and Loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of
help but not enough
How to determine if a packet is lost
Rdt30 Channels with Errors and Loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of
help but not enough
Approach sender waits ldquoreasonablerdquo amount of time for ACK
bull Retransmits if no ACK received in this time
bull If pkt (or ACK) just delayed (not lost)ndash Retransmission will be
duplicate but use of seq rsquos already handles this
ndash Receiver must specify seq of pkt being ACKed
bull Requires countdown timer
Rdt30 Sender
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer
rdt_send(data)
Wait for
ACK0
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )
Wait for call 1 from
above
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer
rdt_send(data)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)
stop_timerstop_timer
udt_send(sndpkt)start_timer
timeout
udt_send(sndpkt)start_timer
timeout
rdt_rcv(rcvpkt)
Wait for call 0from
above
Wait for
ACK1
Lrdt_rcv(rcvpkt)
LL
L
Rdt30 in Action
Rdt30 in Action
Performance of Rdt30
bull Rdt30 works but performance stinkshellipbull ex 1 Gbps link 15 ms prop delay 8000 bit
packet
U sender utilization ndash fraction of time sender busy sending
U sender
= 008
30008 = 000027
microseconds
L R
RTT + L R =
bull 1KB pkt every 30 msec -gt 33kBsec throughput over 1 Gbps link
bull Network protocol limits use of physical resources
dsmicrosecon8bps10
bits80009
R
Ldtrans
(Picture next slide)
Rdt30 Stop-and-Wait Operation
first packet bit transmitted t = 0
sender
receiver
RTT
last packet bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
U sender
= 008
30008 = 000027
microseconds
L R
RTT + L R =
Pipelined Protocols
Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash Range of sequence numbers must be increasedndash Need buffering at sender andor receiver
Pipelining Increased Utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
U sender
= 024
30008 = 00008
microseconds
3 L R
RTT + L R =
Increase utilizationby a factor of 3
bull Two generic forms of pipelined protocols go-Back-N selective repeat
Pipelining Protocols
Go-back-N overviewbull sender up to N
unACKed pkts in pipeline
bull receiver only sends cumulative ACKsndash doesnrsquot ACK pkt if
therersquos a gap
bull sender has timer for oldest unACKed pktndash if timer expires
retransmit all unACKed packets
Selective Repeat overviewbull sender up to N
unACKed packets in pipeline
bull receiver ACKs individual pkts
bull sender maintains timer for each unACKed pktndash if timer expires
retransmit only unACKed packet
Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unACKed pkts allowed
bull ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)
bull Timer for each in-flight pktbull Timeout(n) retransmit pkt n and all higher seq pkts in
window
GBN Sender Extended FSM
Waitstart_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])
timeout
rdt_send(data)
if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)
base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
base=1nextseqnum=1
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
L
GBN Receiver Extended FSM
ACK-only always send ACK for correctly-received pkt with highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum
bull out-of-order pkt ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK pkt with highest in-order seq
Wait
udt_send(sndpkt)
default
rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++
expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)
L
GBN inaction
GBN Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsgo-back-nindexhtml
Selective Repeatbull Receiver individually acknowledges all
correctly received pktsndash Buffers pkts as needed for eventual in-order
delivery to upper layer
bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt
bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts
Selective Repeat sender receiver windows
Selective Repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next
unACKed seq
senderpkt n in [rcvbase rcvbase+N-
1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also
deliver buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1]
bull ACK(n)
otherwise bull ignore
receiver
Transport Layer 3-52
Selective Repeat in Action
Selective Repeat
DilemmaExample bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
SR Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Overview RFCs 793 1122 1323 2018 2581
bull full duplex datandash bi-directional data flow
in same connectionndash MSS maximum
segment size
bull connection-oriented ndash handshaking
(exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not
overwhelm receiver
bull point-to-pointndash one sender one
receiver
bull reliable in-order byte steamndash no ldquomessage
boundariesrdquo
bull pipelinedndash TCP congestion and
flow control set window size
bull send amp receive buffers
socketdoor
T C Psend buffer
T C Preceive buffer
socketdoor
segm ent
applicationwrites data
applicationreads data
TCP Segment Structure
source port dest port
32 bits
applicationdata (variable length)
sequence number
acknowledgement number
Receive window
Urg data pointerchecksumFSRPAU
headlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
TCP Seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next
byte expected from other side
ndash cumulative ACKQ how receiver
handles out-of-order segmentsndash A TCP spec
doesnrsquot say up to implementer
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
UsertypeslsquoCrsquo
host ACKsreceipt of echoedlsquoCrsquo
timesimple telnet scenario
host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTT
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTTbull SampleRTT measured time
from segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
bull Exponential weighted moving averagebull influence of past sample decreases exponentially
fastbull typical value = 18th (or 0125)
Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety margin
bull First estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP reliable data transfer
bull TCP creates rdt service on top of IPrsquos unreliable service
bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs
bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control
TCP Sender EventsData rcvd from appbull Create segment with
seq bull seq is byte-stream
number of first data byte in segment
bull Start timer if not already running (think of timer as for oldest unACKed segment)
bull Expiration interval TimeOutInterval
Timeoutbull retransmit segment
that caused timeoutbull restart timer ACK rcvdbull If acknowledges
previously unACKed segmentsndash update what is known
to be ACKedndash start timer if there are
outstanding segments
TCP Sender
(simplified)
NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71
y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed
TCP Retransmission Scenarios
Host A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq
=92
tim
eout
ACK=120
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq
=92
tim
eout
SendBase= 100
SendBase= 120
SendBase= 120
Sendbase= 100
TCP Retransmission Scenarios (more)
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
Cumulative ACK scenario
Host B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
Fast Retransmit
bull Time-out period often relatively longndash Long delay before
resending lost packet
bull Detect lost segments via duplicate ACKsndash Sender often sends
many segments back-to-back
ndash If segment lost there will likely be many duplicate ACKs for that segment
bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
Host A
timeo
ut
Host B
time
X
resend seq X2
seq x1seq x2seq x3seq x4seq x5
ACK x1
ACK x1ACK x1ACK x1
tripleduplicate
ACKs
Fast
Retra
nsm
it
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Flow Control
bull Receive side of TCP connection has a receive buffer
bull speed-matching service matching send rate to receiving applicationrsquos drain rate
bull App process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast
flow control
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
TCP Flow Control How it Works
(suppose TCP receiver discards out-of-order segments)
bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Receiver advertises unused buffer space by including rwnd value in segment header
bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos
buffer doesnrsquot overflow
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
rwndRcvBuffer
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control
info (eg RcvWindow)bull client connection
initiator Socket clientSocket = new
Socket(ldquohostnamerdquo port) bull server contacted by
client Socket connectionSocket =
welcomeSocketaccept()
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial
seq Step 3 client receives
SYNACK replies with ACK segment which may contain data
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer 3-81
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Principles of Congestion Control
Congestionbull Informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull Different from flow controlbull Manifestations
ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)
bull A ldquotop-10rdquo problem
Causescosts of Congestion Scenario 1
bull Two senders two receivers
bull One router infinite buffers
bull No retransmission
bull Large delays when congested
bull Maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
Causescosts of Congestion Scenario 2
bull One router finite buffers bull Sender retransmission of lost packet
finite shared output link
buffers
Host A
lin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of congestion Scenario 2
bull Always (goodput)
bull ldquoPerfectrdquo retransmission only when loss
bull Retransmission of delayed (not lost) packet makes
larger (than perfect case) for same
lin
lout
=
lin
lout
gtl
inl
out
ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of
pkt
R2
R2lin
l ou
t
b
R2
R2lin
l ou
t
a
R2
R2lin
l ou
t
c
R4
R3
Causescosts of Congestion Scenario 3
bull Four sendersbull Multihop pathsbull Timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of Congestion Scenario 3
Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Host A
Host B
lo
u
t
Approaches towards congestion control
End-end congestion control
bull No explicit feedback from network
bull Congestion inferred from end-system observed loss delay
bull Approach taken by TCP
Network-assisted congestion control
bull Routers provide feedback to end systemsndash Single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash Explicit rate sender should send at
Broadly
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Congestion Control
bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level
bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)
network not congested so increase sending rate
- lost segment assume loss due to congested network so decrease sending rate
TCP Congestion Control Bandwidth Probing
bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since
available bandwidth is changing depending on other connections in network)
ACKs being received so increase rate
X
X
XX
X loss so decrease rate
send
ing
rate
time
bull Q how fast to increasedecrease- details to follow
TCPrsquosldquosawtoothrdquobehavior
TCP Congestion Control detailsbull sender limits rate by limiting
number of unACKed bytes ldquoin pipelinerdquo
ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)
bull roughly
bull cwnd is dynamic function of perceived network congestion
rate = cwnd
RTT bytessec
LastByteSent-LastByteAcked cwnd
cwndbytes
RTT
ACK(s)
TCP Congestion Control more details
segment loss event reducing cwnd
bull timeout no response from receiverndash cut cwnd to 1
bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less
aggressively than on timeout
ACK received increase cwnd
bull slowstart phase - increase exponentially
fast (despite name) at connection start or following timeout
bull congestion avoidance - increase linearly
TCP Slow Start
bull when connection begins cwnd = 1 MSSndash example MSS = 500
bytes amp RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp
up to respectable ratebull increase rate exponentially
until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transitioning intoout of slowstart
ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2
ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to
congestion avoidance phase
slow start timeout
ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed
new ACKdupACKcount++
duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion
avoidance
TCP Congestion Avoidance
bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1
MSS per RTT ndash approach possible
congestion slower than in slowstart
ndash implementation cwnd = cwnd + MSScwnd for each ACK received
bull ACKs increase cwnd by 1 MSS per RTT additive increase
bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease
AIMD
AIMD Additive IncreaseMultiplicative Decrease
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
Rdt20 Channel with Bit Errors (no Loss)
bull Underlying channel may flip bits in packetndash Checksum to detect bit errors
bull The question how to recover from errorsndash acknowledgements (ACKs) receiver explicitly tells
sender that pkt received OKndash negative acknowledgements (NAKs) receiver
explicitly tells sender that pkt had errorsndash Sender retransmits pkt on receipt of NAK
bull New mechanisms in rdt20 (beyond rdt10)ndash Error detectionndash Receiver feedback control msgs (ACKNAK)
rcvrsender
Rdt20 FSM Specification
Wait for call from above
sndpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from below
sender
receiverrdt_send(data)
L
Rdt20 Operation with No Errors
Wait for call from above
snkpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from
below
rdt_send(data)
L
sender
receiver
Rdt20 Error Scenario
Wait for call from above
snkpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from
below
rdt_send(data)
L
sender
receiver
Sender sends one packet then waits for receiver response
stop and wait
Rdt20 Has a Fatal Flaw
Wait for call from above
sndpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from below
sender
receiverrdt_send(data)
L
Rdt20 Has a Fatal Flaw
What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicatebull How to handle duplicates
Rdt20 Has a Fatal Flaw
What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicate
Handling duplicates bull Sender retransmits
current pkt if ACKNAK garbled
bull Sender adds sequence number to each pktndash Can use 1 bit (for now)
bull receiver discards (doesnrsquot deliver up) duplicate pkt
Rdt21 Sender Handles Garbled ACKNAKs
Wait for call 0 from
above
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)
rdt_send(data)
Wait for ACK or NAK 0
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) ||isNAK(rcvpkt) )
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)
rdt_send(data)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)
Wait for call 1 from
above
Wait for ACK or NAK 1
LL
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)
Rdt21 Receiver Handles Garbled ACKNAKs
Wait for 0 from below
sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq0(rcvpkt)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
ampamp has_seq1(rcvpkt) extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
Wait for 1 from below
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq0(rcvpkt)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq1(rcvpkt)
sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
Rdt21 Discussion
Senderbull seq added to pktbull two seq rsquos (01) will sufficebull must check if received ACKNAK corrupted bull twice as many states
ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq
Receiverbull must check if received packet is duplicate
ndash state indicates whether 0 or 1 is expected pkt seq
bull note receiver can not know if its last ACKNAK received OK at sender
Rdt22 a NAK-free Protocol
bull Reduce type of response ACK onlybull Same functionality as rdt21 using ACKs onlybull Instead of NAK receiver sends ACK for last
pkt received OKndash receiver must explicitly include seq of pkt being
ACKed
bull Duplicate ACK at sender results in same action as NAK retransmit current pkt
rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))
udt_send(sndpkt)
Rdt22 Sender amp Receiver Fragments
Wait for call 0 from
above
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)
rdt_send(data)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
Wait for ACK 0
sender FSMfragment
Wait for 0 from below
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)
receiver FSMfragment
L
Rdt30 Channels with Errors and Loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of
help but not enough
How to determine if a packet is lost
Rdt30 Channels with Errors and Loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of
help but not enough
Approach sender waits ldquoreasonablerdquo amount of time for ACK
bull Retransmits if no ACK received in this time
bull If pkt (or ACK) just delayed (not lost)ndash Retransmission will be
duplicate but use of seq rsquos already handles this
ndash Receiver must specify seq of pkt being ACKed
bull Requires countdown timer
Rdt30 Sender
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer
rdt_send(data)
Wait for
ACK0
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )
Wait for call 1 from
above
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer
rdt_send(data)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)
stop_timerstop_timer
udt_send(sndpkt)start_timer
timeout
udt_send(sndpkt)start_timer
timeout
rdt_rcv(rcvpkt)
Wait for call 0from
above
Wait for
ACK1
Lrdt_rcv(rcvpkt)
LL
L
Rdt30 in Action
Rdt30 in Action
Performance of Rdt30
bull Rdt30 works but performance stinkshellipbull ex 1 Gbps link 15 ms prop delay 8000 bit
packet
U sender utilization ndash fraction of time sender busy sending
U sender
= 008
30008 = 000027
microseconds
L R
RTT + L R =
bull 1KB pkt every 30 msec -gt 33kBsec throughput over 1 Gbps link
bull Network protocol limits use of physical resources
dsmicrosecon8bps10
bits80009
R
Ldtrans
(Picture next slide)
Rdt30 Stop-and-Wait Operation
first packet bit transmitted t = 0
sender
receiver
RTT
last packet bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
U sender
= 008
30008 = 000027
microseconds
L R
RTT + L R =
Pipelined Protocols
Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash Range of sequence numbers must be increasedndash Need buffering at sender andor receiver
Pipelining Increased Utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
U sender
= 024
30008 = 00008
microseconds
3 L R
RTT + L R =
Increase utilizationby a factor of 3
bull Two generic forms of pipelined protocols go-Back-N selective repeat
Pipelining Protocols
Go-back-N overviewbull sender up to N
unACKed pkts in pipeline
bull receiver only sends cumulative ACKsndash doesnrsquot ACK pkt if
therersquos a gap
bull sender has timer for oldest unACKed pktndash if timer expires
retransmit all unACKed packets
Selective Repeat overviewbull sender up to N
unACKed packets in pipeline
bull receiver ACKs individual pkts
bull sender maintains timer for each unACKed pktndash if timer expires
retransmit only unACKed packet
Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unACKed pkts allowed
bull ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)
bull Timer for each in-flight pktbull Timeout(n) retransmit pkt n and all higher seq pkts in
window
GBN Sender Extended FSM
Waitstart_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])
timeout
rdt_send(data)
if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)
base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
base=1nextseqnum=1
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
L
GBN Receiver Extended FSM
ACK-only always send ACK for correctly-received pkt with highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum
bull out-of-order pkt ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK pkt with highest in-order seq
Wait
udt_send(sndpkt)
default
rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++
expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)
L
GBN inaction
GBN Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsgo-back-nindexhtml
Selective Repeatbull Receiver individually acknowledges all
correctly received pktsndash Buffers pkts as needed for eventual in-order
delivery to upper layer
bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt
bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts
Selective Repeat sender receiver windows
Selective Repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next
unACKed seq
senderpkt n in [rcvbase rcvbase+N-
1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also
deliver buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1]
bull ACK(n)
otherwise bull ignore
receiver
Transport Layer 3-52
Selective Repeat in Action
Selective Repeat
DilemmaExample bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
SR Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Overview RFCs 793 1122 1323 2018 2581
bull full duplex datandash bi-directional data flow
in same connectionndash MSS maximum
segment size
bull connection-oriented ndash handshaking
(exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not
overwhelm receiver
bull point-to-pointndash one sender one
receiver
bull reliable in-order byte steamndash no ldquomessage
boundariesrdquo
bull pipelinedndash TCP congestion and
flow control set window size
bull send amp receive buffers
socketdoor
T C Psend buffer
T C Preceive buffer
socketdoor
segm ent
applicationwrites data
applicationreads data
TCP Segment Structure
source port dest port
32 bits
applicationdata (variable length)
sequence number
acknowledgement number
Receive window
Urg data pointerchecksumFSRPAU
headlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
TCP Seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next
byte expected from other side
ndash cumulative ACKQ how receiver
handles out-of-order segmentsndash A TCP spec
doesnrsquot say up to implementer
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
UsertypeslsquoCrsquo
host ACKsreceipt of echoedlsquoCrsquo
timesimple telnet scenario
host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTT
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTTbull SampleRTT measured time
from segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
bull Exponential weighted moving averagebull influence of past sample decreases exponentially
fastbull typical value = 18th (or 0125)
Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety margin
bull First estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP reliable data transfer
bull TCP creates rdt service on top of IPrsquos unreliable service
bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs
bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control
TCP Sender EventsData rcvd from appbull Create segment with
seq bull seq is byte-stream
number of first data byte in segment
bull Start timer if not already running (think of timer as for oldest unACKed segment)
bull Expiration interval TimeOutInterval
Timeoutbull retransmit segment
that caused timeoutbull restart timer ACK rcvdbull If acknowledges
previously unACKed segmentsndash update what is known
to be ACKedndash start timer if there are
outstanding segments
TCP Sender
(simplified)
NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71
y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed
TCP Retransmission Scenarios
Host A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq
=92
tim
eout
ACK=120
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq
=92
tim
eout
SendBase= 100
SendBase= 120
SendBase= 120
Sendbase= 100
TCP Retransmission Scenarios (more)
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
Cumulative ACK scenario
Host B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
Fast Retransmit
bull Time-out period often relatively longndash Long delay before
resending lost packet
bull Detect lost segments via duplicate ACKsndash Sender often sends
many segments back-to-back
ndash If segment lost there will likely be many duplicate ACKs for that segment
bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
Host A
timeo
ut
Host B
time
X
resend seq X2
seq x1seq x2seq x3seq x4seq x5
ACK x1
ACK x1ACK x1ACK x1
tripleduplicate
ACKs
Fast
Retra
nsm
it
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Flow Control
bull Receive side of TCP connection has a receive buffer
bull speed-matching service matching send rate to receiving applicationrsquos drain rate
bull App process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast
flow control
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
TCP Flow Control How it Works
(suppose TCP receiver discards out-of-order segments)
bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Receiver advertises unused buffer space by including rwnd value in segment header
bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos
buffer doesnrsquot overflow
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
rwndRcvBuffer
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control
info (eg RcvWindow)bull client connection
initiator Socket clientSocket = new
Socket(ldquohostnamerdquo port) bull server contacted by
client Socket connectionSocket =
welcomeSocketaccept()
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial
seq Step 3 client receives
SYNACK replies with ACK segment which may contain data
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer 3-81
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Principles of Congestion Control
Congestionbull Informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull Different from flow controlbull Manifestations
ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)
bull A ldquotop-10rdquo problem
Causescosts of Congestion Scenario 1
bull Two senders two receivers
bull One router infinite buffers
bull No retransmission
bull Large delays when congested
bull Maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
Causescosts of Congestion Scenario 2
bull One router finite buffers bull Sender retransmission of lost packet
finite shared output link
buffers
Host A
lin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of congestion Scenario 2
bull Always (goodput)
bull ldquoPerfectrdquo retransmission only when loss
bull Retransmission of delayed (not lost) packet makes
larger (than perfect case) for same
lin
lout
=
lin
lout
gtl
inl
out
ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of
pkt
R2
R2lin
l ou
t
b
R2
R2lin
l ou
t
a
R2
R2lin
l ou
t
c
R4
R3
Causescosts of Congestion Scenario 3
bull Four sendersbull Multihop pathsbull Timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of Congestion Scenario 3
Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Host A
Host B
lo
u
t
Approaches towards congestion control
End-end congestion control
bull No explicit feedback from network
bull Congestion inferred from end-system observed loss delay
bull Approach taken by TCP
Network-assisted congestion control
bull Routers provide feedback to end systemsndash Single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash Explicit rate sender should send at
Broadly
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Congestion Control
bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level
bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)
network not congested so increase sending rate
- lost segment assume loss due to congested network so decrease sending rate
TCP Congestion Control Bandwidth Probing
bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since
available bandwidth is changing depending on other connections in network)
ACKs being received so increase rate
X
X
XX
X loss so decrease rate
send
ing
rate
time
bull Q how fast to increasedecrease- details to follow
TCPrsquosldquosawtoothrdquobehavior
TCP Congestion Control detailsbull sender limits rate by limiting
number of unACKed bytes ldquoin pipelinerdquo
ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)
bull roughly
bull cwnd is dynamic function of perceived network congestion
rate = cwnd
RTT bytessec
LastByteSent-LastByteAcked cwnd
cwndbytes
RTT
ACK(s)
TCP Congestion Control more details
segment loss event reducing cwnd
bull timeout no response from receiverndash cut cwnd to 1
bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less
aggressively than on timeout
ACK received increase cwnd
bull slowstart phase - increase exponentially
fast (despite name) at connection start or following timeout
bull congestion avoidance - increase linearly
TCP Slow Start
bull when connection begins cwnd = 1 MSSndash example MSS = 500
bytes amp RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp
up to respectable ratebull increase rate exponentially
until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transitioning intoout of slowstart
ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2
ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to
congestion avoidance phase
slow start timeout
ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed
new ACKdupACKcount++
duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion
avoidance
TCP Congestion Avoidance
bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1
MSS per RTT ndash approach possible
congestion slower than in slowstart
ndash implementation cwnd = cwnd + MSScwnd for each ACK received
bull ACKs increase cwnd by 1 MSS per RTT additive increase
bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease
AIMD
AIMD Additive IncreaseMultiplicative Decrease
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
Rdt20 FSM Specification
Wait for call from above
sndpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from below
sender
receiverrdt_send(data)
L
Rdt20 Operation with No Errors
Wait for call from above
snkpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from
below
rdt_send(data)
L
sender
receiver
Rdt20 Error Scenario
Wait for call from above
snkpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from
below
rdt_send(data)
L
sender
receiver
Sender sends one packet then waits for receiver response
stop and wait
Rdt20 Has a Fatal Flaw
Wait for call from above
sndpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from below
sender
receiverrdt_send(data)
L
Rdt20 Has a Fatal Flaw
What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicatebull How to handle duplicates
Rdt20 Has a Fatal Flaw
What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicate
Handling duplicates bull Sender retransmits
current pkt if ACKNAK garbled
bull Sender adds sequence number to each pktndash Can use 1 bit (for now)
bull receiver discards (doesnrsquot deliver up) duplicate pkt
Rdt21 Sender Handles Garbled ACKNAKs
Wait for call 0 from
above
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)
rdt_send(data)
Wait for ACK or NAK 0
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) ||isNAK(rcvpkt) )
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)
rdt_send(data)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)
Wait for call 1 from
above
Wait for ACK or NAK 1
LL
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)
Rdt21 Receiver Handles Garbled ACKNAKs
Wait for 0 from below
sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq0(rcvpkt)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
ampamp has_seq1(rcvpkt) extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
Wait for 1 from below
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq0(rcvpkt)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq1(rcvpkt)
sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
Rdt21 Discussion
Senderbull seq added to pktbull two seq rsquos (01) will sufficebull must check if received ACKNAK corrupted bull twice as many states
ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq
Receiverbull must check if received packet is duplicate
ndash state indicates whether 0 or 1 is expected pkt seq
bull note receiver can not know if its last ACKNAK received OK at sender
Rdt22 a NAK-free Protocol
bull Reduce type of response ACK onlybull Same functionality as rdt21 using ACKs onlybull Instead of NAK receiver sends ACK for last
pkt received OKndash receiver must explicitly include seq of pkt being
ACKed
bull Duplicate ACK at sender results in same action as NAK retransmit current pkt
rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))
udt_send(sndpkt)
Rdt22 Sender amp Receiver Fragments
Wait for call 0 from
above
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)
rdt_send(data)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
Wait for ACK 0
sender FSMfragment
Wait for 0 from below
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)
receiver FSMfragment
L
Rdt30 Channels with Errors and Loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of
help but not enough
How to determine if a packet is lost
Rdt30 Channels with Errors and Loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of
help but not enough
Approach sender waits ldquoreasonablerdquo amount of time for ACK
bull Retransmits if no ACK received in this time
bull If pkt (or ACK) just delayed (not lost)ndash Retransmission will be
duplicate but use of seq rsquos already handles this
ndash Receiver must specify seq of pkt being ACKed
bull Requires countdown timer
Rdt30 Sender
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer
rdt_send(data)
Wait for
ACK0
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )
Wait for call 1 from
above
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer
rdt_send(data)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)
stop_timerstop_timer
udt_send(sndpkt)start_timer
timeout
udt_send(sndpkt)start_timer
timeout
rdt_rcv(rcvpkt)
Wait for call 0from
above
Wait for
ACK1
Lrdt_rcv(rcvpkt)
LL
L
Rdt30 in Action
Rdt30 in Action
Performance of Rdt30
bull Rdt30 works but performance stinkshellipbull ex 1 Gbps link 15 ms prop delay 8000 bit
packet
U sender utilization ndash fraction of time sender busy sending
U sender
= 008
30008 = 000027
microseconds
L R
RTT + L R =
bull 1KB pkt every 30 msec -gt 33kBsec throughput over 1 Gbps link
bull Network protocol limits use of physical resources
dsmicrosecon8bps10
bits80009
R
Ldtrans
(Picture next slide)
Rdt30 Stop-and-Wait Operation
first packet bit transmitted t = 0
sender
receiver
RTT
last packet bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
U sender
= 008
30008 = 000027
microseconds
L R
RTT + L R =
Pipelined Protocols
Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash Range of sequence numbers must be increasedndash Need buffering at sender andor receiver
Pipelining Increased Utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
U sender
= 024
30008 = 00008
microseconds
3 L R
RTT + L R =
Increase utilizationby a factor of 3
bull Two generic forms of pipelined protocols go-Back-N selective repeat
Pipelining Protocols
Go-back-N overviewbull sender up to N
unACKed pkts in pipeline
bull receiver only sends cumulative ACKsndash doesnrsquot ACK pkt if
therersquos a gap
bull sender has timer for oldest unACKed pktndash if timer expires
retransmit all unACKed packets
Selective Repeat overviewbull sender up to N
unACKed packets in pipeline
bull receiver ACKs individual pkts
bull sender maintains timer for each unACKed pktndash if timer expires
retransmit only unACKed packet
Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unACKed pkts allowed
bull ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)
bull Timer for each in-flight pktbull Timeout(n) retransmit pkt n and all higher seq pkts in
window
GBN Sender Extended FSM
Waitstart_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])
timeout
rdt_send(data)
if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)
base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
base=1nextseqnum=1
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
L
GBN Receiver Extended FSM
ACK-only always send ACK for correctly-received pkt with highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum
bull out-of-order pkt ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK pkt with highest in-order seq
Wait
udt_send(sndpkt)
default
rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++
expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)
L
GBN inaction
GBN Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsgo-back-nindexhtml
Selective Repeatbull Receiver individually acknowledges all
correctly received pktsndash Buffers pkts as needed for eventual in-order
delivery to upper layer
bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt
bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts
Selective Repeat sender receiver windows
Selective Repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next
unACKed seq
senderpkt n in [rcvbase rcvbase+N-
1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also
deliver buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1]
bull ACK(n)
otherwise bull ignore
receiver
Transport Layer 3-52
Selective Repeat in Action
Selective Repeat
DilemmaExample bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
SR Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Overview RFCs 793 1122 1323 2018 2581
bull full duplex datandash bi-directional data flow
in same connectionndash MSS maximum
segment size
bull connection-oriented ndash handshaking
(exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not
overwhelm receiver
bull point-to-pointndash one sender one
receiver
bull reliable in-order byte steamndash no ldquomessage
boundariesrdquo
bull pipelinedndash TCP congestion and
flow control set window size
bull send amp receive buffers
socketdoor
T C Psend buffer
T C Preceive buffer
socketdoor
segm ent
applicationwrites data
applicationreads data
TCP Segment Structure
source port dest port
32 bits
applicationdata (variable length)
sequence number
acknowledgement number
Receive window
Urg data pointerchecksumFSRPAU
headlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
TCP Seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next
byte expected from other side
ndash cumulative ACKQ how receiver
handles out-of-order segmentsndash A TCP spec
doesnrsquot say up to implementer
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
UsertypeslsquoCrsquo
host ACKsreceipt of echoedlsquoCrsquo
timesimple telnet scenario
host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTT
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTTbull SampleRTT measured time
from segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
bull Exponential weighted moving averagebull influence of past sample decreases exponentially
fastbull typical value = 18th (or 0125)
Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety margin
bull First estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP reliable data transfer
bull TCP creates rdt service on top of IPrsquos unreliable service
bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs
bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control
TCP Sender EventsData rcvd from appbull Create segment with
seq bull seq is byte-stream
number of first data byte in segment
bull Start timer if not already running (think of timer as for oldest unACKed segment)
bull Expiration interval TimeOutInterval
Timeoutbull retransmit segment
that caused timeoutbull restart timer ACK rcvdbull If acknowledges
previously unACKed segmentsndash update what is known
to be ACKedndash start timer if there are
outstanding segments
TCP Sender
(simplified)
NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71
y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed
TCP Retransmission Scenarios
Host A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq
=92
tim
eout
ACK=120
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq
=92
tim
eout
SendBase= 100
SendBase= 120
SendBase= 120
Sendbase= 100
TCP Retransmission Scenarios (more)
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
Cumulative ACK scenario
Host B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
Fast Retransmit
bull Time-out period often relatively longndash Long delay before
resending lost packet
bull Detect lost segments via duplicate ACKsndash Sender often sends
many segments back-to-back
ndash If segment lost there will likely be many duplicate ACKs for that segment
bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
Host A
timeo
ut
Host B
time
X
resend seq X2
seq x1seq x2seq x3seq x4seq x5
ACK x1
ACK x1ACK x1ACK x1
tripleduplicate
ACKs
Fast
Retra
nsm
it
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Flow Control
bull Receive side of TCP connection has a receive buffer
bull speed-matching service matching send rate to receiving applicationrsquos drain rate
bull App process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast
flow control
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
TCP Flow Control How it Works
(suppose TCP receiver discards out-of-order segments)
bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Receiver advertises unused buffer space by including rwnd value in segment header
bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos
buffer doesnrsquot overflow
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
rwndRcvBuffer
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control
info (eg RcvWindow)bull client connection
initiator Socket clientSocket = new
Socket(ldquohostnamerdquo port) bull server contacted by
client Socket connectionSocket =
welcomeSocketaccept()
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial
seq Step 3 client receives
SYNACK replies with ACK segment which may contain data
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer 3-81
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Principles of Congestion Control
Congestionbull Informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull Different from flow controlbull Manifestations
ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)
bull A ldquotop-10rdquo problem
Causescosts of Congestion Scenario 1
bull Two senders two receivers
bull One router infinite buffers
bull No retransmission
bull Large delays when congested
bull Maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
Causescosts of Congestion Scenario 2
bull One router finite buffers bull Sender retransmission of lost packet
finite shared output link
buffers
Host A
lin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of congestion Scenario 2
bull Always (goodput)
bull ldquoPerfectrdquo retransmission only when loss
bull Retransmission of delayed (not lost) packet makes
larger (than perfect case) for same
lin
lout
=
lin
lout
gtl
inl
out
ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of
pkt
R2
R2lin
l ou
t
b
R2
R2lin
l ou
t
a
R2
R2lin
l ou
t
c
R4
R3
Causescosts of Congestion Scenario 3
bull Four sendersbull Multihop pathsbull Timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of Congestion Scenario 3
Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Host A
Host B
lo
u
t
Approaches towards congestion control
End-end congestion control
bull No explicit feedback from network
bull Congestion inferred from end-system observed loss delay
bull Approach taken by TCP
Network-assisted congestion control
bull Routers provide feedback to end systemsndash Single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash Explicit rate sender should send at
Broadly
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Congestion Control
bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level
bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)
network not congested so increase sending rate
- lost segment assume loss due to congested network so decrease sending rate
TCP Congestion Control Bandwidth Probing
bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since
available bandwidth is changing depending on other connections in network)
ACKs being received so increase rate
X
X
XX
X loss so decrease rate
send
ing
rate
time
bull Q how fast to increasedecrease- details to follow
TCPrsquosldquosawtoothrdquobehavior
TCP Congestion Control detailsbull sender limits rate by limiting
number of unACKed bytes ldquoin pipelinerdquo
ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)
bull roughly
bull cwnd is dynamic function of perceived network congestion
rate = cwnd
RTT bytessec
LastByteSent-LastByteAcked cwnd
cwndbytes
RTT
ACK(s)
TCP Congestion Control more details
segment loss event reducing cwnd
bull timeout no response from receiverndash cut cwnd to 1
bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less
aggressively than on timeout
ACK received increase cwnd
bull slowstart phase - increase exponentially
fast (despite name) at connection start or following timeout
bull congestion avoidance - increase linearly
TCP Slow Start
bull when connection begins cwnd = 1 MSSndash example MSS = 500
bytes amp RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp
up to respectable ratebull increase rate exponentially
until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transitioning intoout of slowstart
ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2
ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to
congestion avoidance phase
slow start timeout
ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed
new ACKdupACKcount++
duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion
avoidance
TCP Congestion Avoidance
bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1
MSS per RTT ndash approach possible
congestion slower than in slowstart
ndash implementation cwnd = cwnd + MSScwnd for each ACK received
bull ACKs increase cwnd by 1 MSS per RTT additive increase
bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease
AIMD
AIMD Additive IncreaseMultiplicative Decrease
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
Rdt20 Operation with No Errors
Wait for call from above
snkpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from
below
rdt_send(data)
L
sender
receiver
Rdt20 Error Scenario
Wait for call from above
snkpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from
below
rdt_send(data)
L
sender
receiver
Sender sends one packet then waits for receiver response
stop and wait
Rdt20 Has a Fatal Flaw
Wait for call from above
sndpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from below
sender
receiverrdt_send(data)
L
Rdt20 Has a Fatal Flaw
What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicatebull How to handle duplicates
Rdt20 Has a Fatal Flaw
What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicate
Handling duplicates bull Sender retransmits
current pkt if ACKNAK garbled
bull Sender adds sequence number to each pktndash Can use 1 bit (for now)
bull receiver discards (doesnrsquot deliver up) duplicate pkt
Rdt21 Sender Handles Garbled ACKNAKs
Wait for call 0 from
above
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)
rdt_send(data)
Wait for ACK or NAK 0
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) ||isNAK(rcvpkt) )
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)
rdt_send(data)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)
Wait for call 1 from
above
Wait for ACK or NAK 1
LL
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)
Rdt21 Receiver Handles Garbled ACKNAKs
Wait for 0 from below
sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq0(rcvpkt)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
ampamp has_seq1(rcvpkt) extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
Wait for 1 from below
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq0(rcvpkt)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq1(rcvpkt)
sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
Rdt21 Discussion
Senderbull seq added to pktbull two seq rsquos (01) will sufficebull must check if received ACKNAK corrupted bull twice as many states
ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq
Receiverbull must check if received packet is duplicate
ndash state indicates whether 0 or 1 is expected pkt seq
bull note receiver can not know if its last ACKNAK received OK at sender
Rdt22 a NAK-free Protocol
bull Reduce type of response ACK onlybull Same functionality as rdt21 using ACKs onlybull Instead of NAK receiver sends ACK for last
pkt received OKndash receiver must explicitly include seq of pkt being
ACKed
bull Duplicate ACK at sender results in same action as NAK retransmit current pkt
rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))
udt_send(sndpkt)
Rdt22 Sender amp Receiver Fragments
Wait for call 0 from
above
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)
rdt_send(data)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
Wait for ACK 0
sender FSMfragment
Wait for 0 from below
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)
receiver FSMfragment
L
Rdt30 Channels with Errors and Loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of
help but not enough
How to determine if a packet is lost
Rdt30 Channels with Errors and Loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of
help but not enough
Approach sender waits ldquoreasonablerdquo amount of time for ACK
bull Retransmits if no ACK received in this time
bull If pkt (or ACK) just delayed (not lost)ndash Retransmission will be
duplicate but use of seq rsquos already handles this
ndash Receiver must specify seq of pkt being ACKed
bull Requires countdown timer
Rdt30 Sender
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer
rdt_send(data)
Wait for
ACK0
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )
Wait for call 1 from
above
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer
rdt_send(data)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)
stop_timerstop_timer
udt_send(sndpkt)start_timer
timeout
udt_send(sndpkt)start_timer
timeout
rdt_rcv(rcvpkt)
Wait for call 0from
above
Wait for
ACK1
Lrdt_rcv(rcvpkt)
LL
L
Rdt30 in Action
Rdt30 in Action
Performance of Rdt30
bull Rdt30 works but performance stinkshellipbull ex 1 Gbps link 15 ms prop delay 8000 bit
packet
U sender utilization ndash fraction of time sender busy sending
U sender
= 008
30008 = 000027
microseconds
L R
RTT + L R =
bull 1KB pkt every 30 msec -gt 33kBsec throughput over 1 Gbps link
bull Network protocol limits use of physical resources
dsmicrosecon8bps10
bits80009
R
Ldtrans
(Picture next slide)
Rdt30 Stop-and-Wait Operation
first packet bit transmitted t = 0
sender
receiver
RTT
last packet bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
U sender
= 008
30008 = 000027
microseconds
L R
RTT + L R =
Pipelined Protocols
Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash Range of sequence numbers must be increasedndash Need buffering at sender andor receiver
Pipelining Increased Utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
U sender
= 024
30008 = 00008
microseconds
3 L R
RTT + L R =
Increase utilizationby a factor of 3
bull Two generic forms of pipelined protocols go-Back-N selective repeat
Pipelining Protocols
Go-back-N overviewbull sender up to N
unACKed pkts in pipeline
bull receiver only sends cumulative ACKsndash doesnrsquot ACK pkt if
therersquos a gap
bull sender has timer for oldest unACKed pktndash if timer expires
retransmit all unACKed packets
Selective Repeat overviewbull sender up to N
unACKed packets in pipeline
bull receiver ACKs individual pkts
bull sender maintains timer for each unACKed pktndash if timer expires
retransmit only unACKed packet
Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unACKed pkts allowed
bull ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)
bull Timer for each in-flight pktbull Timeout(n) retransmit pkt n and all higher seq pkts in
window
GBN Sender Extended FSM
Waitstart_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])
timeout
rdt_send(data)
if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)
base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
base=1nextseqnum=1
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
L
GBN Receiver Extended FSM
ACK-only always send ACK for correctly-received pkt with highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum
bull out-of-order pkt ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK pkt with highest in-order seq
Wait
udt_send(sndpkt)
default
rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++
expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)
L
GBN inaction
GBN Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsgo-back-nindexhtml
Selective Repeatbull Receiver individually acknowledges all
correctly received pktsndash Buffers pkts as needed for eventual in-order
delivery to upper layer
bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt
bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts
Selective Repeat sender receiver windows
Selective Repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next
unACKed seq
senderpkt n in [rcvbase rcvbase+N-
1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also
deliver buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1]
bull ACK(n)
otherwise bull ignore
receiver
Transport Layer 3-52
Selective Repeat in Action
Selective Repeat
DilemmaExample bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
SR Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Overview RFCs 793 1122 1323 2018 2581
bull full duplex datandash bi-directional data flow
in same connectionndash MSS maximum
segment size
bull connection-oriented ndash handshaking
(exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not
overwhelm receiver
bull point-to-pointndash one sender one
receiver
bull reliable in-order byte steamndash no ldquomessage
boundariesrdquo
bull pipelinedndash TCP congestion and
flow control set window size
bull send amp receive buffers
socketdoor
T C Psend buffer
T C Preceive buffer
socketdoor
segm ent
applicationwrites data
applicationreads data
TCP Segment Structure
source port dest port
32 bits
applicationdata (variable length)
sequence number
acknowledgement number
Receive window
Urg data pointerchecksumFSRPAU
headlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
TCP Seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next
byte expected from other side
ndash cumulative ACKQ how receiver
handles out-of-order segmentsndash A TCP spec
doesnrsquot say up to implementer
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
UsertypeslsquoCrsquo
host ACKsreceipt of echoedlsquoCrsquo
timesimple telnet scenario
host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTT
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTTbull SampleRTT measured time
from segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
bull Exponential weighted moving averagebull influence of past sample decreases exponentially
fastbull typical value = 18th (or 0125)
Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety margin
bull First estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP reliable data transfer
bull TCP creates rdt service on top of IPrsquos unreliable service
bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs
bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control
TCP Sender EventsData rcvd from appbull Create segment with
seq bull seq is byte-stream
number of first data byte in segment
bull Start timer if not already running (think of timer as for oldest unACKed segment)
bull Expiration interval TimeOutInterval
Timeoutbull retransmit segment
that caused timeoutbull restart timer ACK rcvdbull If acknowledges
previously unACKed segmentsndash update what is known
to be ACKedndash start timer if there are
outstanding segments
TCP Sender
(simplified)
NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71
y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed
TCP Retransmission Scenarios
Host A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq
=92
tim
eout
ACK=120
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq
=92
tim
eout
SendBase= 100
SendBase= 120
SendBase= 120
Sendbase= 100
TCP Retransmission Scenarios (more)
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
Cumulative ACK scenario
Host B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
Fast Retransmit
bull Time-out period often relatively longndash Long delay before
resending lost packet
bull Detect lost segments via duplicate ACKsndash Sender often sends
many segments back-to-back
ndash If segment lost there will likely be many duplicate ACKs for that segment
bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
Host A
timeo
ut
Host B
time
X
resend seq X2
seq x1seq x2seq x3seq x4seq x5
ACK x1
ACK x1ACK x1ACK x1
tripleduplicate
ACKs
Fast
Retra
nsm
it
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Flow Control
bull Receive side of TCP connection has a receive buffer
bull speed-matching service matching send rate to receiving applicationrsquos drain rate
bull App process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast
flow control
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
TCP Flow Control How it Works
(suppose TCP receiver discards out-of-order segments)
bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Receiver advertises unused buffer space by including rwnd value in segment header
bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos
buffer doesnrsquot overflow
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
rwndRcvBuffer
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control
info (eg RcvWindow)bull client connection
initiator Socket clientSocket = new
Socket(ldquohostnamerdquo port) bull server contacted by
client Socket connectionSocket =
welcomeSocketaccept()
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial
seq Step 3 client receives
SYNACK replies with ACK segment which may contain data
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer 3-81
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Principles of Congestion Control
Congestionbull Informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull Different from flow controlbull Manifestations
ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)
bull A ldquotop-10rdquo problem
Causescosts of Congestion Scenario 1
bull Two senders two receivers
bull One router infinite buffers
bull No retransmission
bull Large delays when congested
bull Maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
Causescosts of Congestion Scenario 2
bull One router finite buffers bull Sender retransmission of lost packet
finite shared output link
buffers
Host A
lin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of congestion Scenario 2
bull Always (goodput)
bull ldquoPerfectrdquo retransmission only when loss
bull Retransmission of delayed (not lost) packet makes
larger (than perfect case) for same
lin
lout
=
lin
lout
gtl
inl
out
ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of
pkt
R2
R2lin
l ou
t
b
R2
R2lin
l ou
t
a
R2
R2lin
l ou
t
c
R4
R3
Causescosts of Congestion Scenario 3
bull Four sendersbull Multihop pathsbull Timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of Congestion Scenario 3
Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Host A
Host B
lo
u
t
Approaches towards congestion control
End-end congestion control
bull No explicit feedback from network
bull Congestion inferred from end-system observed loss delay
bull Approach taken by TCP
Network-assisted congestion control
bull Routers provide feedback to end systemsndash Single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash Explicit rate sender should send at
Broadly
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Congestion Control
bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level
bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)
network not congested so increase sending rate
- lost segment assume loss due to congested network so decrease sending rate
TCP Congestion Control Bandwidth Probing
bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since
available bandwidth is changing depending on other connections in network)
ACKs being received so increase rate
X
X
XX
X loss so decrease rate
send
ing
rate
time
bull Q how fast to increasedecrease- details to follow
TCPrsquosldquosawtoothrdquobehavior
TCP Congestion Control detailsbull sender limits rate by limiting
number of unACKed bytes ldquoin pipelinerdquo
ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)
bull roughly
bull cwnd is dynamic function of perceived network congestion
rate = cwnd
RTT bytessec
LastByteSent-LastByteAcked cwnd
cwndbytes
RTT
ACK(s)
TCP Congestion Control more details
segment loss event reducing cwnd
bull timeout no response from receiverndash cut cwnd to 1
bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less
aggressively than on timeout
ACK received increase cwnd
bull slowstart phase - increase exponentially
fast (despite name) at connection start or following timeout
bull congestion avoidance - increase linearly
TCP Slow Start
bull when connection begins cwnd = 1 MSSndash example MSS = 500
bytes amp RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp
up to respectable ratebull increase rate exponentially
until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transitioning intoout of slowstart
ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2
ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to
congestion avoidance phase
slow start timeout
ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed
new ACKdupACKcount++
duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion
avoidance
TCP Congestion Avoidance
bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1
MSS per RTT ndash approach possible
congestion slower than in slowstart
ndash implementation cwnd = cwnd + MSScwnd for each ACK received
bull ACKs increase cwnd by 1 MSS per RTT additive increase
bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease
AIMD
AIMD Additive IncreaseMultiplicative Decrease
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
Rdt20 Error Scenario
Wait for call from above
snkpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from
below
rdt_send(data)
L
sender
receiver
Sender sends one packet then waits for receiver response
stop and wait
Rdt20 Has a Fatal Flaw
Wait for call from above
sndpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from below
sender
receiverrdt_send(data)
L
Rdt20 Has a Fatal Flaw
What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicatebull How to handle duplicates
Rdt20 Has a Fatal Flaw
What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicate
Handling duplicates bull Sender retransmits
current pkt if ACKNAK garbled
bull Sender adds sequence number to each pktndash Can use 1 bit (for now)
bull receiver discards (doesnrsquot deliver up) duplicate pkt
Rdt21 Sender Handles Garbled ACKNAKs
Wait for call 0 from
above
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)
rdt_send(data)
Wait for ACK or NAK 0
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) ||isNAK(rcvpkt) )
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)
rdt_send(data)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)
Wait for call 1 from
above
Wait for ACK or NAK 1
LL
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)
Rdt21 Receiver Handles Garbled ACKNAKs
Wait for 0 from below
sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq0(rcvpkt)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
ampamp has_seq1(rcvpkt) extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
Wait for 1 from below
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq0(rcvpkt)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq1(rcvpkt)
sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
Rdt21 Discussion
Senderbull seq added to pktbull two seq rsquos (01) will sufficebull must check if received ACKNAK corrupted bull twice as many states
ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq
Receiverbull must check if received packet is duplicate
ndash state indicates whether 0 or 1 is expected pkt seq
bull note receiver can not know if its last ACKNAK received OK at sender
Rdt22 a NAK-free Protocol
bull Reduce type of response ACK onlybull Same functionality as rdt21 using ACKs onlybull Instead of NAK receiver sends ACK for last
pkt received OKndash receiver must explicitly include seq of pkt being
ACKed
bull Duplicate ACK at sender results in same action as NAK retransmit current pkt
rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))
udt_send(sndpkt)
Rdt22 Sender amp Receiver Fragments
Wait for call 0 from
above
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)
rdt_send(data)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
Wait for ACK 0
sender FSMfragment
Wait for 0 from below
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)
receiver FSMfragment
L
Rdt30 Channels with Errors and Loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of
help but not enough
How to determine if a packet is lost
Rdt30 Channels with Errors and Loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of
help but not enough
Approach sender waits ldquoreasonablerdquo amount of time for ACK
bull Retransmits if no ACK received in this time
bull If pkt (or ACK) just delayed (not lost)ndash Retransmission will be
duplicate but use of seq rsquos already handles this
ndash Receiver must specify seq of pkt being ACKed
bull Requires countdown timer
Rdt30 Sender
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer
rdt_send(data)
Wait for
ACK0
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )
Wait for call 1 from
above
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer
rdt_send(data)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)
stop_timerstop_timer
udt_send(sndpkt)start_timer
timeout
udt_send(sndpkt)start_timer
timeout
rdt_rcv(rcvpkt)
Wait for call 0from
above
Wait for
ACK1
Lrdt_rcv(rcvpkt)
LL
L
Rdt30 in Action
Rdt30 in Action
Performance of Rdt30
bull Rdt30 works but performance stinkshellipbull ex 1 Gbps link 15 ms prop delay 8000 bit
packet
U sender utilization ndash fraction of time sender busy sending
U sender
= 008
30008 = 000027
microseconds
L R
RTT + L R =
bull 1KB pkt every 30 msec -gt 33kBsec throughput over 1 Gbps link
bull Network protocol limits use of physical resources
dsmicrosecon8bps10
bits80009
R
Ldtrans
(Picture next slide)
Rdt30 Stop-and-Wait Operation
first packet bit transmitted t = 0
sender
receiver
RTT
last packet bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
U sender
= 008
30008 = 000027
microseconds
L R
RTT + L R =
Pipelined Protocols
Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash Range of sequence numbers must be increasedndash Need buffering at sender andor receiver
Pipelining Increased Utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
U sender
= 024
30008 = 00008
microseconds
3 L R
RTT + L R =
Increase utilizationby a factor of 3
bull Two generic forms of pipelined protocols go-Back-N selective repeat
Pipelining Protocols
Go-back-N overviewbull sender up to N
unACKed pkts in pipeline
bull receiver only sends cumulative ACKsndash doesnrsquot ACK pkt if
therersquos a gap
bull sender has timer for oldest unACKed pktndash if timer expires
retransmit all unACKed packets
Selective Repeat overviewbull sender up to N
unACKed packets in pipeline
bull receiver ACKs individual pkts
bull sender maintains timer for each unACKed pktndash if timer expires
retransmit only unACKed packet
Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unACKed pkts allowed
bull ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)
bull Timer for each in-flight pktbull Timeout(n) retransmit pkt n and all higher seq pkts in
window
GBN Sender Extended FSM
Waitstart_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])
timeout
rdt_send(data)
if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)
base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
base=1nextseqnum=1
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
L
GBN Receiver Extended FSM
ACK-only always send ACK for correctly-received pkt with highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum
bull out-of-order pkt ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK pkt with highest in-order seq
Wait
udt_send(sndpkt)
default
rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++
expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)
L
GBN inaction
GBN Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsgo-back-nindexhtml
Selective Repeatbull Receiver individually acknowledges all
correctly received pktsndash Buffers pkts as needed for eventual in-order
delivery to upper layer
bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt
bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts
Selective Repeat sender receiver windows
Selective Repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next
unACKed seq
senderpkt n in [rcvbase rcvbase+N-
1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also
deliver buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1]
bull ACK(n)
otherwise bull ignore
receiver
Transport Layer 3-52
Selective Repeat in Action
Selective Repeat
DilemmaExample bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
SR Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Overview RFCs 793 1122 1323 2018 2581
bull full duplex datandash bi-directional data flow
in same connectionndash MSS maximum
segment size
bull connection-oriented ndash handshaking
(exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not
overwhelm receiver
bull point-to-pointndash one sender one
receiver
bull reliable in-order byte steamndash no ldquomessage
boundariesrdquo
bull pipelinedndash TCP congestion and
flow control set window size
bull send amp receive buffers
socketdoor
T C Psend buffer
T C Preceive buffer
socketdoor
segm ent
applicationwrites data
applicationreads data
TCP Segment Structure
source port dest port
32 bits
applicationdata (variable length)
sequence number
acknowledgement number
Receive window
Urg data pointerchecksumFSRPAU
headlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
TCP Seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next
byte expected from other side
ndash cumulative ACKQ how receiver
handles out-of-order segmentsndash A TCP spec
doesnrsquot say up to implementer
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
UsertypeslsquoCrsquo
host ACKsreceipt of echoedlsquoCrsquo
timesimple telnet scenario
host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTT
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTTbull SampleRTT measured time
from segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
bull Exponential weighted moving averagebull influence of past sample decreases exponentially
fastbull typical value = 18th (or 0125)
Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety margin
bull First estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP reliable data transfer
bull TCP creates rdt service on top of IPrsquos unreliable service
bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs
bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control
TCP Sender EventsData rcvd from appbull Create segment with
seq bull seq is byte-stream
number of first data byte in segment
bull Start timer if not already running (think of timer as for oldest unACKed segment)
bull Expiration interval TimeOutInterval
Timeoutbull retransmit segment
that caused timeoutbull restart timer ACK rcvdbull If acknowledges
previously unACKed segmentsndash update what is known
to be ACKedndash start timer if there are
outstanding segments
TCP Sender
(simplified)
NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71
y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed
TCP Retransmission Scenarios
Host A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq
=92
tim
eout
ACK=120
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq
=92
tim
eout
SendBase= 100
SendBase= 120
SendBase= 120
Sendbase= 100
TCP Retransmission Scenarios (more)
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
Cumulative ACK scenario
Host B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
Fast Retransmit
bull Time-out period often relatively longndash Long delay before
resending lost packet
bull Detect lost segments via duplicate ACKsndash Sender often sends
many segments back-to-back
ndash If segment lost there will likely be many duplicate ACKs for that segment
bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
Host A
timeo
ut
Host B
time
X
resend seq X2
seq x1seq x2seq x3seq x4seq x5
ACK x1
ACK x1ACK x1ACK x1
tripleduplicate
ACKs
Fast
Retra
nsm
it
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Flow Control
bull Receive side of TCP connection has a receive buffer
bull speed-matching service matching send rate to receiving applicationrsquos drain rate
bull App process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast
flow control
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
TCP Flow Control How it Works
(suppose TCP receiver discards out-of-order segments)
bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Receiver advertises unused buffer space by including rwnd value in segment header
bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos
buffer doesnrsquot overflow
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
rwndRcvBuffer
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control
info (eg RcvWindow)bull client connection
initiator Socket clientSocket = new
Socket(ldquohostnamerdquo port) bull server contacted by
client Socket connectionSocket =
welcomeSocketaccept()
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial
seq Step 3 client receives
SYNACK replies with ACK segment which may contain data
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer 3-81
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Principles of Congestion Control
Congestionbull Informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull Different from flow controlbull Manifestations
ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)
bull A ldquotop-10rdquo problem
Causescosts of Congestion Scenario 1
bull Two senders two receivers
bull One router infinite buffers
bull No retransmission
bull Large delays when congested
bull Maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
Causescosts of Congestion Scenario 2
bull One router finite buffers bull Sender retransmission of lost packet
finite shared output link
buffers
Host A
lin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of congestion Scenario 2
bull Always (goodput)
bull ldquoPerfectrdquo retransmission only when loss
bull Retransmission of delayed (not lost) packet makes
larger (than perfect case) for same
lin
lout
=
lin
lout
gtl
inl
out
ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of
pkt
R2
R2lin
l ou
t
b
R2
R2lin
l ou
t
a
R2
R2lin
l ou
t
c
R4
R3
Causescosts of Congestion Scenario 3
bull Four sendersbull Multihop pathsbull Timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of Congestion Scenario 3
Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Host A
Host B
lo
u
t
Approaches towards congestion control
End-end congestion control
bull No explicit feedback from network
bull Congestion inferred from end-system observed loss delay
bull Approach taken by TCP
Network-assisted congestion control
bull Routers provide feedback to end systemsndash Single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash Explicit rate sender should send at
Broadly
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Congestion Control
bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level
bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)
network not congested so increase sending rate
- lost segment assume loss due to congested network so decrease sending rate
TCP Congestion Control Bandwidth Probing
bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since
available bandwidth is changing depending on other connections in network)
ACKs being received so increase rate
X
X
XX
X loss so decrease rate
send
ing
rate
time
bull Q how fast to increasedecrease- details to follow
TCPrsquosldquosawtoothrdquobehavior
TCP Congestion Control detailsbull sender limits rate by limiting
number of unACKed bytes ldquoin pipelinerdquo
ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)
bull roughly
bull cwnd is dynamic function of perceived network congestion
rate = cwnd
RTT bytessec
LastByteSent-LastByteAcked cwnd
cwndbytes
RTT
ACK(s)
TCP Congestion Control more details
segment loss event reducing cwnd
bull timeout no response from receiverndash cut cwnd to 1
bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less
aggressively than on timeout
ACK received increase cwnd
bull slowstart phase - increase exponentially
fast (despite name) at connection start or following timeout
bull congestion avoidance - increase linearly
TCP Slow Start
bull when connection begins cwnd = 1 MSSndash example MSS = 500
bytes amp RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp
up to respectable ratebull increase rate exponentially
until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transitioning intoout of slowstart
ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2
ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to
congestion avoidance phase
slow start timeout
ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed
new ACKdupACKcount++
duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion
avoidance
TCP Congestion Avoidance
bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1
MSS per RTT ndash approach possible
congestion slower than in slowstart
ndash implementation cwnd = cwnd + MSScwnd for each ACK received
bull ACKs increase cwnd by 1 MSS per RTT additive increase
bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease
AIMD
AIMD Additive IncreaseMultiplicative Decrease
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
Rdt20 Has a Fatal Flaw
Wait for call from above
sndpkt = make_pkt(data checksum)udt_send(sndpkt)
extract(rcvpktdata)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from below
sender
receiverrdt_send(data)
L
Rdt20 Has a Fatal Flaw
What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicatebull How to handle duplicates
Rdt20 Has a Fatal Flaw
What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicate
Handling duplicates bull Sender retransmits
current pkt if ACKNAK garbled
bull Sender adds sequence number to each pktndash Can use 1 bit (for now)
bull receiver discards (doesnrsquot deliver up) duplicate pkt
Rdt21 Sender Handles Garbled ACKNAKs
Wait for call 0 from
above
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)
rdt_send(data)
Wait for ACK or NAK 0
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) ||isNAK(rcvpkt) )
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)
rdt_send(data)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)
Wait for call 1 from
above
Wait for ACK or NAK 1
LL
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)
Rdt21 Receiver Handles Garbled ACKNAKs
Wait for 0 from below
sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq0(rcvpkt)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
ampamp has_seq1(rcvpkt) extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
Wait for 1 from below
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq0(rcvpkt)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq1(rcvpkt)
sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
Rdt21 Discussion
Senderbull seq added to pktbull two seq rsquos (01) will sufficebull must check if received ACKNAK corrupted bull twice as many states
ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq
Receiverbull must check if received packet is duplicate
ndash state indicates whether 0 or 1 is expected pkt seq
bull note receiver can not know if its last ACKNAK received OK at sender
Rdt22 a NAK-free Protocol
bull Reduce type of response ACK onlybull Same functionality as rdt21 using ACKs onlybull Instead of NAK receiver sends ACK for last
pkt received OKndash receiver must explicitly include seq of pkt being
ACKed
bull Duplicate ACK at sender results in same action as NAK retransmit current pkt
rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))
udt_send(sndpkt)
Rdt22 Sender amp Receiver Fragments
Wait for call 0 from
above
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)
rdt_send(data)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
Wait for ACK 0
sender FSMfragment
Wait for 0 from below
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)
receiver FSMfragment
L
Rdt30 Channels with Errors and Loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of
help but not enough
How to determine if a packet is lost
Rdt30 Channels with Errors and Loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of
help but not enough
Approach sender waits ldquoreasonablerdquo amount of time for ACK
bull Retransmits if no ACK received in this time
bull If pkt (or ACK) just delayed (not lost)ndash Retransmission will be
duplicate but use of seq rsquos already handles this
ndash Receiver must specify seq of pkt being ACKed
bull Requires countdown timer
Rdt30 Sender
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer
rdt_send(data)
Wait for
ACK0
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )
Wait for call 1 from
above
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer
rdt_send(data)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)
stop_timerstop_timer
udt_send(sndpkt)start_timer
timeout
udt_send(sndpkt)start_timer
timeout
rdt_rcv(rcvpkt)
Wait for call 0from
above
Wait for
ACK1
Lrdt_rcv(rcvpkt)
LL
L
Rdt30 in Action
Rdt30 in Action
Performance of Rdt30
bull Rdt30 works but performance stinkshellipbull ex 1 Gbps link 15 ms prop delay 8000 bit
packet
U sender utilization ndash fraction of time sender busy sending
U sender
= 008
30008 = 000027
microseconds
L R
RTT + L R =
bull 1KB pkt every 30 msec -gt 33kBsec throughput over 1 Gbps link
bull Network protocol limits use of physical resources
dsmicrosecon8bps10
bits80009
R
Ldtrans
(Picture next slide)
Rdt30 Stop-and-Wait Operation
first packet bit transmitted t = 0
sender
receiver
RTT
last packet bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
U sender
= 008
30008 = 000027
microseconds
L R
RTT + L R =
Pipelined Protocols
Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash Range of sequence numbers must be increasedndash Need buffering at sender andor receiver
Pipelining Increased Utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
U sender
= 024
30008 = 00008
microseconds
3 L R
RTT + L R =
Increase utilizationby a factor of 3
bull Two generic forms of pipelined protocols go-Back-N selective repeat
Pipelining Protocols
Go-back-N overviewbull sender up to N
unACKed pkts in pipeline
bull receiver only sends cumulative ACKsndash doesnrsquot ACK pkt if
therersquos a gap
bull sender has timer for oldest unACKed pktndash if timer expires
retransmit all unACKed packets
Selective Repeat overviewbull sender up to N
unACKed packets in pipeline
bull receiver ACKs individual pkts
bull sender maintains timer for each unACKed pktndash if timer expires
retransmit only unACKed packet
Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unACKed pkts allowed
bull ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)
bull Timer for each in-flight pktbull Timeout(n) retransmit pkt n and all higher seq pkts in
window
GBN Sender Extended FSM
Waitstart_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])
timeout
rdt_send(data)
if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)
base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
base=1nextseqnum=1
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
L
GBN Receiver Extended FSM
ACK-only always send ACK for correctly-received pkt with highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum
bull out-of-order pkt ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK pkt with highest in-order seq
Wait
udt_send(sndpkt)
default
rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++
expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)
L
GBN inaction
GBN Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsgo-back-nindexhtml
Selective Repeatbull Receiver individually acknowledges all
correctly received pktsndash Buffers pkts as needed for eventual in-order
delivery to upper layer
bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt
bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts
Selective Repeat sender receiver windows
Selective Repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next
unACKed seq
senderpkt n in [rcvbase rcvbase+N-
1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also
deliver buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1]
bull ACK(n)
otherwise bull ignore
receiver
Transport Layer 3-52
Selective Repeat in Action
Selective Repeat
DilemmaExample bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
SR Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Overview RFCs 793 1122 1323 2018 2581
bull full duplex datandash bi-directional data flow
in same connectionndash MSS maximum
segment size
bull connection-oriented ndash handshaking
(exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not
overwhelm receiver
bull point-to-pointndash one sender one
receiver
bull reliable in-order byte steamndash no ldquomessage
boundariesrdquo
bull pipelinedndash TCP congestion and
flow control set window size
bull send amp receive buffers
socketdoor
T C Psend buffer
T C Preceive buffer
socketdoor
segm ent
applicationwrites data
applicationreads data
TCP Segment Structure
source port dest port
32 bits
applicationdata (variable length)
sequence number
acknowledgement number
Receive window
Urg data pointerchecksumFSRPAU
headlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
TCP Seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next
byte expected from other side
ndash cumulative ACKQ how receiver
handles out-of-order segmentsndash A TCP spec
doesnrsquot say up to implementer
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
UsertypeslsquoCrsquo
host ACKsreceipt of echoedlsquoCrsquo
timesimple telnet scenario
host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTT
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTTbull SampleRTT measured time
from segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
bull Exponential weighted moving averagebull influence of past sample decreases exponentially
fastbull typical value = 18th (or 0125)
Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety margin
bull First estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP reliable data transfer
bull TCP creates rdt service on top of IPrsquos unreliable service
bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs
bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control
TCP Sender EventsData rcvd from appbull Create segment with
seq bull seq is byte-stream
number of first data byte in segment
bull Start timer if not already running (think of timer as for oldest unACKed segment)
bull Expiration interval TimeOutInterval
Timeoutbull retransmit segment
that caused timeoutbull restart timer ACK rcvdbull If acknowledges
previously unACKed segmentsndash update what is known
to be ACKedndash start timer if there are
outstanding segments
TCP Sender
(simplified)
NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71
y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed
TCP Retransmission Scenarios
Host A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq
=92
tim
eout
ACK=120
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq
=92
tim
eout
SendBase= 100
SendBase= 120
SendBase= 120
Sendbase= 100
TCP Retransmission Scenarios (more)
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
Cumulative ACK scenario
Host B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
Fast Retransmit
bull Time-out period often relatively longndash Long delay before
resending lost packet
bull Detect lost segments via duplicate ACKsndash Sender often sends
many segments back-to-back
ndash If segment lost there will likely be many duplicate ACKs for that segment
bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
Host A
timeo
ut
Host B
time
X
resend seq X2
seq x1seq x2seq x3seq x4seq x5
ACK x1
ACK x1ACK x1ACK x1
tripleduplicate
ACKs
Fast
Retra
nsm
it
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Flow Control
bull Receive side of TCP connection has a receive buffer
bull speed-matching service matching send rate to receiving applicationrsquos drain rate
bull App process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast
flow control
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
TCP Flow Control How it Works
(suppose TCP receiver discards out-of-order segments)
bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Receiver advertises unused buffer space by including rwnd value in segment header
bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos
buffer doesnrsquot overflow
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
rwndRcvBuffer
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control
info (eg RcvWindow)bull client connection
initiator Socket clientSocket = new
Socket(ldquohostnamerdquo port) bull server contacted by
client Socket connectionSocket =
welcomeSocketaccept()
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial
seq Step 3 client receives
SYNACK replies with ACK segment which may contain data
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer 3-81
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Principles of Congestion Control
Congestionbull Informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull Different from flow controlbull Manifestations
ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)
bull A ldquotop-10rdquo problem
Causescosts of Congestion Scenario 1
bull Two senders two receivers
bull One router infinite buffers
bull No retransmission
bull Large delays when congested
bull Maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
Causescosts of Congestion Scenario 2
bull One router finite buffers bull Sender retransmission of lost packet
finite shared output link
buffers
Host A
lin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of congestion Scenario 2
bull Always (goodput)
bull ldquoPerfectrdquo retransmission only when loss
bull Retransmission of delayed (not lost) packet makes
larger (than perfect case) for same
lin
lout
=
lin
lout
gtl
inl
out
ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of
pkt
R2
R2lin
l ou
t
b
R2
R2lin
l ou
t
a
R2
R2lin
l ou
t
c
R4
R3
Causescosts of Congestion Scenario 3
bull Four sendersbull Multihop pathsbull Timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of Congestion Scenario 3
Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Host A
Host B
lo
u
t
Approaches towards congestion control
End-end congestion control
bull No explicit feedback from network
bull Congestion inferred from end-system observed loss delay
bull Approach taken by TCP
Network-assisted congestion control
bull Routers provide feedback to end systemsndash Single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash Explicit rate sender should send at
Broadly
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Congestion Control
bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level
bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)
network not congested so increase sending rate
- lost segment assume loss due to congested network so decrease sending rate
TCP Congestion Control Bandwidth Probing
bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since
available bandwidth is changing depending on other connections in network)
ACKs being received so increase rate
X
X
XX
X loss so decrease rate
send
ing
rate
time
bull Q how fast to increasedecrease- details to follow
TCPrsquosldquosawtoothrdquobehavior
TCP Congestion Control detailsbull sender limits rate by limiting
number of unACKed bytes ldquoin pipelinerdquo
ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)
bull roughly
bull cwnd is dynamic function of perceived network congestion
rate = cwnd
RTT bytessec
LastByteSent-LastByteAcked cwnd
cwndbytes
RTT
ACK(s)
TCP Congestion Control more details
segment loss event reducing cwnd
bull timeout no response from receiverndash cut cwnd to 1
bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less
aggressively than on timeout
ACK received increase cwnd
bull slowstart phase - increase exponentially
fast (despite name) at connection start or following timeout
bull congestion avoidance - increase linearly
TCP Slow Start
bull when connection begins cwnd = 1 MSSndash example MSS = 500
bytes amp RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp
up to respectable ratebull increase rate exponentially
until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transitioning intoout of slowstart
ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2
ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to
congestion avoidance phase
slow start timeout
ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed
new ACKdupACKcount++
duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion
avoidance
TCP Congestion Avoidance
bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1
MSS per RTT ndash approach possible
congestion slower than in slowstart
ndash implementation cwnd = cwnd + MSScwnd for each ACK received
bull ACKs increase cwnd by 1 MSS per RTT additive increase
bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease
AIMD
AIMD Additive IncreaseMultiplicative Decrease
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
Rdt20 Has a Fatal Flaw
What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicatebull How to handle duplicates
Rdt20 Has a Fatal Flaw
What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicate
Handling duplicates bull Sender retransmits
current pkt if ACKNAK garbled
bull Sender adds sequence number to each pktndash Can use 1 bit (for now)
bull receiver discards (doesnrsquot deliver up) duplicate pkt
Rdt21 Sender Handles Garbled ACKNAKs
Wait for call 0 from
above
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)
rdt_send(data)
Wait for ACK or NAK 0
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) ||isNAK(rcvpkt) )
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)
rdt_send(data)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)
Wait for call 1 from
above
Wait for ACK or NAK 1
LL
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)
Rdt21 Receiver Handles Garbled ACKNAKs
Wait for 0 from below
sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq0(rcvpkt)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
ampamp has_seq1(rcvpkt) extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
Wait for 1 from below
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq0(rcvpkt)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq1(rcvpkt)
sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
Rdt21 Discussion
Senderbull seq added to pktbull two seq rsquos (01) will sufficebull must check if received ACKNAK corrupted bull twice as many states
ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq
Receiverbull must check if received packet is duplicate
ndash state indicates whether 0 or 1 is expected pkt seq
bull note receiver can not know if its last ACKNAK received OK at sender
Rdt22 a NAK-free Protocol
bull Reduce type of response ACK onlybull Same functionality as rdt21 using ACKs onlybull Instead of NAK receiver sends ACK for last
pkt received OKndash receiver must explicitly include seq of pkt being
ACKed
bull Duplicate ACK at sender results in same action as NAK retransmit current pkt
rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))
udt_send(sndpkt)
Rdt22 Sender amp Receiver Fragments
Wait for call 0 from
above
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)
rdt_send(data)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
Wait for ACK 0
sender FSMfragment
Wait for 0 from below
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)
receiver FSMfragment
L
Rdt30 Channels with Errors and Loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of
help but not enough
How to determine if a packet is lost
Rdt30 Channels with Errors and Loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of
help but not enough
Approach sender waits ldquoreasonablerdquo amount of time for ACK
bull Retransmits if no ACK received in this time
bull If pkt (or ACK) just delayed (not lost)ndash Retransmission will be
duplicate but use of seq rsquos already handles this
ndash Receiver must specify seq of pkt being ACKed
bull Requires countdown timer
Rdt30 Sender
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer
rdt_send(data)
Wait for
ACK0
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )
Wait for call 1 from
above
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer
rdt_send(data)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)
stop_timerstop_timer
udt_send(sndpkt)start_timer
timeout
udt_send(sndpkt)start_timer
timeout
rdt_rcv(rcvpkt)
Wait for call 0from
above
Wait for
ACK1
Lrdt_rcv(rcvpkt)
LL
L
Rdt30 in Action
Rdt30 in Action
Performance of Rdt30
bull Rdt30 works but performance stinkshellipbull ex 1 Gbps link 15 ms prop delay 8000 bit
packet
U sender utilization ndash fraction of time sender busy sending
U sender
= 008
30008 = 000027
microseconds
L R
RTT + L R =
bull 1KB pkt every 30 msec -gt 33kBsec throughput over 1 Gbps link
bull Network protocol limits use of physical resources
dsmicrosecon8bps10
bits80009
R
Ldtrans
(Picture next slide)
Rdt30 Stop-and-Wait Operation
first packet bit transmitted t = 0
sender
receiver
RTT
last packet bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
U sender
= 008
30008 = 000027
microseconds
L R
RTT + L R =
Pipelined Protocols
Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash Range of sequence numbers must be increasedndash Need buffering at sender andor receiver
Pipelining Increased Utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
U sender
= 024
30008 = 00008
microseconds
3 L R
RTT + L R =
Increase utilizationby a factor of 3
bull Two generic forms of pipelined protocols go-Back-N selective repeat
Pipelining Protocols
Go-back-N overviewbull sender up to N
unACKed pkts in pipeline
bull receiver only sends cumulative ACKsndash doesnrsquot ACK pkt if
therersquos a gap
bull sender has timer for oldest unACKed pktndash if timer expires
retransmit all unACKed packets
Selective Repeat overviewbull sender up to N
unACKed packets in pipeline
bull receiver ACKs individual pkts
bull sender maintains timer for each unACKed pktndash if timer expires
retransmit only unACKed packet
Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unACKed pkts allowed
bull ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)
bull Timer for each in-flight pktbull Timeout(n) retransmit pkt n and all higher seq pkts in
window
GBN Sender Extended FSM
Waitstart_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])
timeout
rdt_send(data)
if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)
base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
base=1nextseqnum=1
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
L
GBN Receiver Extended FSM
ACK-only always send ACK for correctly-received pkt with highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum
bull out-of-order pkt ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK pkt with highest in-order seq
Wait
udt_send(sndpkt)
default
rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++
expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)
L
GBN inaction
GBN Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsgo-back-nindexhtml
Selective Repeatbull Receiver individually acknowledges all
correctly received pktsndash Buffers pkts as needed for eventual in-order
delivery to upper layer
bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt
bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts
Selective Repeat sender receiver windows
Selective Repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next
unACKed seq
senderpkt n in [rcvbase rcvbase+N-
1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also
deliver buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1]
bull ACK(n)
otherwise bull ignore
receiver
Transport Layer 3-52
Selective Repeat in Action
Selective Repeat
DilemmaExample bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
SR Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Overview RFCs 793 1122 1323 2018 2581
bull full duplex datandash bi-directional data flow
in same connectionndash MSS maximum
segment size
bull connection-oriented ndash handshaking
(exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not
overwhelm receiver
bull point-to-pointndash one sender one
receiver
bull reliable in-order byte steamndash no ldquomessage
boundariesrdquo
bull pipelinedndash TCP congestion and
flow control set window size
bull send amp receive buffers
socketdoor
T C Psend buffer
T C Preceive buffer
socketdoor
segm ent
applicationwrites data
applicationreads data
TCP Segment Structure
source port dest port
32 bits
applicationdata (variable length)
sequence number
acknowledgement number
Receive window
Urg data pointerchecksumFSRPAU
headlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
TCP Seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next
byte expected from other side
ndash cumulative ACKQ how receiver
handles out-of-order segmentsndash A TCP spec
doesnrsquot say up to implementer
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
UsertypeslsquoCrsquo
host ACKsreceipt of echoedlsquoCrsquo
timesimple telnet scenario
host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTT
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTTbull SampleRTT measured time
from segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
bull Exponential weighted moving averagebull influence of past sample decreases exponentially
fastbull typical value = 18th (or 0125)
Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety margin
bull First estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP reliable data transfer
bull TCP creates rdt service on top of IPrsquos unreliable service
bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs
bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control
TCP Sender EventsData rcvd from appbull Create segment with
seq bull seq is byte-stream
number of first data byte in segment
bull Start timer if not already running (think of timer as for oldest unACKed segment)
bull Expiration interval TimeOutInterval
Timeoutbull retransmit segment
that caused timeoutbull restart timer ACK rcvdbull If acknowledges
previously unACKed segmentsndash update what is known
to be ACKedndash start timer if there are
outstanding segments
TCP Sender
(simplified)
NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71
y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed
TCP Retransmission Scenarios
Host A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq
=92
tim
eout
ACK=120
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq
=92
tim
eout
SendBase= 100
SendBase= 120
SendBase= 120
Sendbase= 100
TCP Retransmission Scenarios (more)
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
Cumulative ACK scenario
Host B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
Fast Retransmit
bull Time-out period often relatively longndash Long delay before
resending lost packet
bull Detect lost segments via duplicate ACKsndash Sender often sends
many segments back-to-back
ndash If segment lost there will likely be many duplicate ACKs for that segment
bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
Host A
timeo
ut
Host B
time
X
resend seq X2
seq x1seq x2seq x3seq x4seq x5
ACK x1
ACK x1ACK x1ACK x1
tripleduplicate
ACKs
Fast
Retra
nsm
it
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Flow Control
bull Receive side of TCP connection has a receive buffer
bull speed-matching service matching send rate to receiving applicationrsquos drain rate
bull App process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast
flow control
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
TCP Flow Control How it Works
(suppose TCP receiver discards out-of-order segments)
bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Receiver advertises unused buffer space by including rwnd value in segment header
bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos
buffer doesnrsquot overflow
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
rwndRcvBuffer
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control
info (eg RcvWindow)bull client connection
initiator Socket clientSocket = new
Socket(ldquohostnamerdquo port) bull server contacted by
client Socket connectionSocket =
welcomeSocketaccept()
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial
seq Step 3 client receives
SYNACK replies with ACK segment which may contain data
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer 3-81
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Principles of Congestion Control
Congestionbull Informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull Different from flow controlbull Manifestations
ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)
bull A ldquotop-10rdquo problem
Causescosts of Congestion Scenario 1
bull Two senders two receivers
bull One router infinite buffers
bull No retransmission
bull Large delays when congested
bull Maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
Causescosts of Congestion Scenario 2
bull One router finite buffers bull Sender retransmission of lost packet
finite shared output link
buffers
Host A
lin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of congestion Scenario 2
bull Always (goodput)
bull ldquoPerfectrdquo retransmission only when loss
bull Retransmission of delayed (not lost) packet makes
larger (than perfect case) for same
lin
lout
=
lin
lout
gtl
inl
out
ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of
pkt
R2
R2lin
l ou
t
b
R2
R2lin
l ou
t
a
R2
R2lin
l ou
t
c
R4
R3
Causescosts of Congestion Scenario 3
bull Four sendersbull Multihop pathsbull Timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of Congestion Scenario 3
Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Host A
Host B
lo
u
t
Approaches towards congestion control
End-end congestion control
bull No explicit feedback from network
bull Congestion inferred from end-system observed loss delay
bull Approach taken by TCP
Network-assisted congestion control
bull Routers provide feedback to end systemsndash Single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash Explicit rate sender should send at
Broadly
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Congestion Control
bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level
bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)
network not congested so increase sending rate
- lost segment assume loss due to congested network so decrease sending rate
TCP Congestion Control Bandwidth Probing
bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since
available bandwidth is changing depending on other connections in network)
ACKs being received so increase rate
X
X
XX
X loss so decrease rate
send
ing
rate
time
bull Q how fast to increasedecrease- details to follow
TCPrsquosldquosawtoothrdquobehavior
TCP Congestion Control detailsbull sender limits rate by limiting
number of unACKed bytes ldquoin pipelinerdquo
ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)
bull roughly
bull cwnd is dynamic function of perceived network congestion
rate = cwnd
RTT bytessec
LastByteSent-LastByteAcked cwnd
cwndbytes
RTT
ACK(s)
TCP Congestion Control more details
segment loss event reducing cwnd
bull timeout no response from receiverndash cut cwnd to 1
bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less
aggressively than on timeout
ACK received increase cwnd
bull slowstart phase - increase exponentially
fast (despite name) at connection start or following timeout
bull congestion avoidance - increase linearly
TCP Slow Start
bull when connection begins cwnd = 1 MSSndash example MSS = 500
bytes amp RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp
up to respectable ratebull increase rate exponentially
until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transitioning intoout of slowstart
ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2
ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to
congestion avoidance phase
slow start timeout
ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed
new ACKdupACKcount++
duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion
avoidance
TCP Congestion Avoidance
bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1
MSS per RTT ndash approach possible
congestion slower than in slowstart
ndash implementation cwnd = cwnd + MSScwnd for each ACK received
bull ACKs increase cwnd by 1 MSS per RTT additive increase
bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease
AIMD
AIMD Additive IncreaseMultiplicative Decrease
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
Rdt20 Has a Fatal Flaw
What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicate
Handling duplicates bull Sender retransmits
current pkt if ACKNAK garbled
bull Sender adds sequence number to each pktndash Can use 1 bit (for now)
bull receiver discards (doesnrsquot deliver up) duplicate pkt
Rdt21 Sender Handles Garbled ACKNAKs
Wait for call 0 from
above
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)
rdt_send(data)
Wait for ACK or NAK 0
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) ||isNAK(rcvpkt) )
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)
rdt_send(data)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)
Wait for call 1 from
above
Wait for ACK or NAK 1
LL
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)
Rdt21 Receiver Handles Garbled ACKNAKs
Wait for 0 from below
sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq0(rcvpkt)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
ampamp has_seq1(rcvpkt) extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
Wait for 1 from below
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq0(rcvpkt)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq1(rcvpkt)
sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
Rdt21 Discussion
Senderbull seq added to pktbull two seq rsquos (01) will sufficebull must check if received ACKNAK corrupted bull twice as many states
ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq
Receiverbull must check if received packet is duplicate
ndash state indicates whether 0 or 1 is expected pkt seq
bull note receiver can not know if its last ACKNAK received OK at sender
Rdt22 a NAK-free Protocol
bull Reduce type of response ACK onlybull Same functionality as rdt21 using ACKs onlybull Instead of NAK receiver sends ACK for last
pkt received OKndash receiver must explicitly include seq of pkt being
ACKed
bull Duplicate ACK at sender results in same action as NAK retransmit current pkt
rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))
udt_send(sndpkt)
Rdt22 Sender amp Receiver Fragments
Wait for call 0 from
above
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)
rdt_send(data)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
Wait for ACK 0
sender FSMfragment
Wait for 0 from below
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)
receiver FSMfragment
L
Rdt30 Channels with Errors and Loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of
help but not enough
How to determine if a packet is lost
Rdt30 Channels with Errors and Loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of
help but not enough
Approach sender waits ldquoreasonablerdquo amount of time for ACK
bull Retransmits if no ACK received in this time
bull If pkt (or ACK) just delayed (not lost)ndash Retransmission will be
duplicate but use of seq rsquos already handles this
ndash Receiver must specify seq of pkt being ACKed
bull Requires countdown timer
Rdt30 Sender
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer
rdt_send(data)
Wait for
ACK0
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )
Wait for call 1 from
above
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer
rdt_send(data)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)
stop_timerstop_timer
udt_send(sndpkt)start_timer
timeout
udt_send(sndpkt)start_timer
timeout
rdt_rcv(rcvpkt)
Wait for call 0from
above
Wait for
ACK1
Lrdt_rcv(rcvpkt)
LL
L
Rdt30 in Action
Rdt30 in Action
Performance of Rdt30
bull Rdt30 works but performance stinkshellipbull ex 1 Gbps link 15 ms prop delay 8000 bit
packet
U sender utilization ndash fraction of time sender busy sending
U sender
= 008
30008 = 000027
microseconds
L R
RTT + L R =
bull 1KB pkt every 30 msec -gt 33kBsec throughput over 1 Gbps link
bull Network protocol limits use of physical resources
dsmicrosecon8bps10
bits80009
R
Ldtrans
(Picture next slide)
Rdt30 Stop-and-Wait Operation
first packet bit transmitted t = 0
sender
receiver
RTT
last packet bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
U sender
= 008
30008 = 000027
microseconds
L R
RTT + L R =
Pipelined Protocols
Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash Range of sequence numbers must be increasedndash Need buffering at sender andor receiver
Pipelining Increased Utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
U sender
= 024
30008 = 00008
microseconds
3 L R
RTT + L R =
Increase utilizationby a factor of 3
bull Two generic forms of pipelined protocols go-Back-N selective repeat
Pipelining Protocols
Go-back-N overviewbull sender up to N
unACKed pkts in pipeline
bull receiver only sends cumulative ACKsndash doesnrsquot ACK pkt if
therersquos a gap
bull sender has timer for oldest unACKed pktndash if timer expires
retransmit all unACKed packets
Selective Repeat overviewbull sender up to N
unACKed packets in pipeline
bull receiver ACKs individual pkts
bull sender maintains timer for each unACKed pktndash if timer expires
retransmit only unACKed packet
Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unACKed pkts allowed
bull ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)
bull Timer for each in-flight pktbull Timeout(n) retransmit pkt n and all higher seq pkts in
window
GBN Sender Extended FSM
Waitstart_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])
timeout
rdt_send(data)
if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)
base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
base=1nextseqnum=1
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
L
GBN Receiver Extended FSM
ACK-only always send ACK for correctly-received pkt with highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum
bull out-of-order pkt ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK pkt with highest in-order seq
Wait
udt_send(sndpkt)
default
rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++
expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)
L
GBN inaction
GBN Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsgo-back-nindexhtml
Selective Repeatbull Receiver individually acknowledges all
correctly received pktsndash Buffers pkts as needed for eventual in-order
delivery to upper layer
bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt
bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts
Selective Repeat sender receiver windows
Selective Repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next
unACKed seq
senderpkt n in [rcvbase rcvbase+N-
1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also
deliver buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1]
bull ACK(n)
otherwise bull ignore
receiver
Transport Layer 3-52
Selective Repeat in Action
Selective Repeat
DilemmaExample bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
SR Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Overview RFCs 793 1122 1323 2018 2581
bull full duplex datandash bi-directional data flow
in same connectionndash MSS maximum
segment size
bull connection-oriented ndash handshaking
(exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not
overwhelm receiver
bull point-to-pointndash one sender one
receiver
bull reliable in-order byte steamndash no ldquomessage
boundariesrdquo
bull pipelinedndash TCP congestion and
flow control set window size
bull send amp receive buffers
socketdoor
T C Psend buffer
T C Preceive buffer
socketdoor
segm ent
applicationwrites data
applicationreads data
TCP Segment Structure
source port dest port
32 bits
applicationdata (variable length)
sequence number
acknowledgement number
Receive window
Urg data pointerchecksumFSRPAU
headlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
TCP Seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next
byte expected from other side
ndash cumulative ACKQ how receiver
handles out-of-order segmentsndash A TCP spec
doesnrsquot say up to implementer
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
UsertypeslsquoCrsquo
host ACKsreceipt of echoedlsquoCrsquo
timesimple telnet scenario
host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTT
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTTbull SampleRTT measured time
from segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
bull Exponential weighted moving averagebull influence of past sample decreases exponentially
fastbull typical value = 18th (or 0125)
Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety margin
bull First estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP reliable data transfer
bull TCP creates rdt service on top of IPrsquos unreliable service
bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs
bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control
TCP Sender EventsData rcvd from appbull Create segment with
seq bull seq is byte-stream
number of first data byte in segment
bull Start timer if not already running (think of timer as for oldest unACKed segment)
bull Expiration interval TimeOutInterval
Timeoutbull retransmit segment
that caused timeoutbull restart timer ACK rcvdbull If acknowledges
previously unACKed segmentsndash update what is known
to be ACKedndash start timer if there are
outstanding segments
TCP Sender
(simplified)
NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71
y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed
TCP Retransmission Scenarios
Host A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq
=92
tim
eout
ACK=120
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq
=92
tim
eout
SendBase= 100
SendBase= 120
SendBase= 120
Sendbase= 100
TCP Retransmission Scenarios (more)
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
Cumulative ACK scenario
Host B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
Fast Retransmit
bull Time-out period often relatively longndash Long delay before
resending lost packet
bull Detect lost segments via duplicate ACKsndash Sender often sends
many segments back-to-back
ndash If segment lost there will likely be many duplicate ACKs for that segment
bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
Host A
timeo
ut
Host B
time
X
resend seq X2
seq x1seq x2seq x3seq x4seq x5
ACK x1
ACK x1ACK x1ACK x1
tripleduplicate
ACKs
Fast
Retra
nsm
it
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Flow Control
bull Receive side of TCP connection has a receive buffer
bull speed-matching service matching send rate to receiving applicationrsquos drain rate
bull App process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast
flow control
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
TCP Flow Control How it Works
(suppose TCP receiver discards out-of-order segments)
bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Receiver advertises unused buffer space by including rwnd value in segment header
bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos
buffer doesnrsquot overflow
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
rwndRcvBuffer
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control
info (eg RcvWindow)bull client connection
initiator Socket clientSocket = new
Socket(ldquohostnamerdquo port) bull server contacted by
client Socket connectionSocket =
welcomeSocketaccept()
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial
seq Step 3 client receives
SYNACK replies with ACK segment which may contain data
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer 3-81
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Principles of Congestion Control
Congestionbull Informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull Different from flow controlbull Manifestations
ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)
bull A ldquotop-10rdquo problem
Causescosts of Congestion Scenario 1
bull Two senders two receivers
bull One router infinite buffers
bull No retransmission
bull Large delays when congested
bull Maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
Causescosts of Congestion Scenario 2
bull One router finite buffers bull Sender retransmission of lost packet
finite shared output link
buffers
Host A
lin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of congestion Scenario 2
bull Always (goodput)
bull ldquoPerfectrdquo retransmission only when loss
bull Retransmission of delayed (not lost) packet makes
larger (than perfect case) for same
lin
lout
=
lin
lout
gtl
inl
out
ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of
pkt
R2
R2lin
l ou
t
b
R2
R2lin
l ou
t
a
R2
R2lin
l ou
t
c
R4
R3
Causescosts of Congestion Scenario 3
bull Four sendersbull Multihop pathsbull Timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of Congestion Scenario 3
Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Host A
Host B
lo
u
t
Approaches towards congestion control
End-end congestion control
bull No explicit feedback from network
bull Congestion inferred from end-system observed loss delay
bull Approach taken by TCP
Network-assisted congestion control
bull Routers provide feedback to end systemsndash Single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash Explicit rate sender should send at
Broadly
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Congestion Control
bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level
bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)
network not congested so increase sending rate
- lost segment assume loss due to congested network so decrease sending rate
TCP Congestion Control Bandwidth Probing
bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since
available bandwidth is changing depending on other connections in network)
ACKs being received so increase rate
X
X
XX
X loss so decrease rate
send
ing
rate
time
bull Q how fast to increasedecrease- details to follow
TCPrsquosldquosawtoothrdquobehavior
TCP Congestion Control detailsbull sender limits rate by limiting
number of unACKed bytes ldquoin pipelinerdquo
ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)
bull roughly
bull cwnd is dynamic function of perceived network congestion
rate = cwnd
RTT bytessec
LastByteSent-LastByteAcked cwnd
cwndbytes
RTT
ACK(s)
TCP Congestion Control more details
segment loss event reducing cwnd
bull timeout no response from receiverndash cut cwnd to 1
bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less
aggressively than on timeout
ACK received increase cwnd
bull slowstart phase - increase exponentially
fast (despite name) at connection start or following timeout
bull congestion avoidance - increase linearly
TCP Slow Start
bull when connection begins cwnd = 1 MSSndash example MSS = 500
bytes amp RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp
up to respectable ratebull increase rate exponentially
until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transitioning intoout of slowstart
ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2
ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to
congestion avoidance phase
slow start timeout
ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed
new ACKdupACKcount++
duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion
avoidance
TCP Congestion Avoidance
bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1
MSS per RTT ndash approach possible
congestion slower than in slowstart
ndash implementation cwnd = cwnd + MSScwnd for each ACK received
bull ACKs increase cwnd by 1 MSS per RTT additive increase
bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease
AIMD
AIMD Additive IncreaseMultiplicative Decrease
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
Rdt21 Sender Handles Garbled ACKNAKs
Wait for call 0 from
above
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)
rdt_send(data)
Wait for ACK or NAK 0
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) ||isNAK(rcvpkt) )
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)
rdt_send(data)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)
Wait for call 1 from
above
Wait for ACK or NAK 1
LL
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)
Rdt21 Receiver Handles Garbled ACKNAKs
Wait for 0 from below
sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq0(rcvpkt)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
ampamp has_seq1(rcvpkt) extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
Wait for 1 from below
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq0(rcvpkt)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq1(rcvpkt)
sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
Rdt21 Discussion
Senderbull seq added to pktbull two seq rsquos (01) will sufficebull must check if received ACKNAK corrupted bull twice as many states
ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq
Receiverbull must check if received packet is duplicate
ndash state indicates whether 0 or 1 is expected pkt seq
bull note receiver can not know if its last ACKNAK received OK at sender
Rdt22 a NAK-free Protocol
bull Reduce type of response ACK onlybull Same functionality as rdt21 using ACKs onlybull Instead of NAK receiver sends ACK for last
pkt received OKndash receiver must explicitly include seq of pkt being
ACKed
bull Duplicate ACK at sender results in same action as NAK retransmit current pkt
rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))
udt_send(sndpkt)
Rdt22 Sender amp Receiver Fragments
Wait for call 0 from
above
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)
rdt_send(data)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
Wait for ACK 0
sender FSMfragment
Wait for 0 from below
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)
receiver FSMfragment
L
Rdt30 Channels with Errors and Loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of
help but not enough
How to determine if a packet is lost
Rdt30 Channels with Errors and Loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of
help but not enough
Approach sender waits ldquoreasonablerdquo amount of time for ACK
bull Retransmits if no ACK received in this time
bull If pkt (or ACK) just delayed (not lost)ndash Retransmission will be
duplicate but use of seq rsquos already handles this
ndash Receiver must specify seq of pkt being ACKed
bull Requires countdown timer
Rdt30 Sender
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer
rdt_send(data)
Wait for
ACK0
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )
Wait for call 1 from
above
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer
rdt_send(data)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)
stop_timerstop_timer
udt_send(sndpkt)start_timer
timeout
udt_send(sndpkt)start_timer
timeout
rdt_rcv(rcvpkt)
Wait for call 0from
above
Wait for
ACK1
Lrdt_rcv(rcvpkt)
LL
L
Rdt30 in Action
Rdt30 in Action
Performance of Rdt30
bull Rdt30 works but performance stinkshellipbull ex 1 Gbps link 15 ms prop delay 8000 bit
packet
U sender utilization ndash fraction of time sender busy sending
U sender
= 008
30008 = 000027
microseconds
L R
RTT + L R =
bull 1KB pkt every 30 msec -gt 33kBsec throughput over 1 Gbps link
bull Network protocol limits use of physical resources
dsmicrosecon8bps10
bits80009
R
Ldtrans
(Picture next slide)
Rdt30 Stop-and-Wait Operation
first packet bit transmitted t = 0
sender
receiver
RTT
last packet bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
U sender
= 008
30008 = 000027
microseconds
L R
RTT + L R =
Pipelined Protocols
Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash Range of sequence numbers must be increasedndash Need buffering at sender andor receiver
Pipelining Increased Utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
U sender
= 024
30008 = 00008
microseconds
3 L R
RTT + L R =
Increase utilizationby a factor of 3
bull Two generic forms of pipelined protocols go-Back-N selective repeat
Pipelining Protocols
Go-back-N overviewbull sender up to N
unACKed pkts in pipeline
bull receiver only sends cumulative ACKsndash doesnrsquot ACK pkt if
therersquos a gap
bull sender has timer for oldest unACKed pktndash if timer expires
retransmit all unACKed packets
Selective Repeat overviewbull sender up to N
unACKed packets in pipeline
bull receiver ACKs individual pkts
bull sender maintains timer for each unACKed pktndash if timer expires
retransmit only unACKed packet
Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unACKed pkts allowed
bull ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)
bull Timer for each in-flight pktbull Timeout(n) retransmit pkt n and all higher seq pkts in
window
GBN Sender Extended FSM
Waitstart_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])
timeout
rdt_send(data)
if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)
base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
base=1nextseqnum=1
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
L
GBN Receiver Extended FSM
ACK-only always send ACK for correctly-received pkt with highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum
bull out-of-order pkt ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK pkt with highest in-order seq
Wait
udt_send(sndpkt)
default
rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++
expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)
L
GBN inaction
GBN Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsgo-back-nindexhtml
Selective Repeatbull Receiver individually acknowledges all
correctly received pktsndash Buffers pkts as needed for eventual in-order
delivery to upper layer
bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt
bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts
Selective Repeat sender receiver windows
Selective Repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next
unACKed seq
senderpkt n in [rcvbase rcvbase+N-
1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also
deliver buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1]
bull ACK(n)
otherwise bull ignore
receiver
Transport Layer 3-52
Selective Repeat in Action
Selective Repeat
DilemmaExample bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
SR Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Overview RFCs 793 1122 1323 2018 2581
bull full duplex datandash bi-directional data flow
in same connectionndash MSS maximum
segment size
bull connection-oriented ndash handshaking
(exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not
overwhelm receiver
bull point-to-pointndash one sender one
receiver
bull reliable in-order byte steamndash no ldquomessage
boundariesrdquo
bull pipelinedndash TCP congestion and
flow control set window size
bull send amp receive buffers
socketdoor
T C Psend buffer
T C Preceive buffer
socketdoor
segm ent
applicationwrites data
applicationreads data
TCP Segment Structure
source port dest port
32 bits
applicationdata (variable length)
sequence number
acknowledgement number
Receive window
Urg data pointerchecksumFSRPAU
headlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
TCP Seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next
byte expected from other side
ndash cumulative ACKQ how receiver
handles out-of-order segmentsndash A TCP spec
doesnrsquot say up to implementer
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
UsertypeslsquoCrsquo
host ACKsreceipt of echoedlsquoCrsquo
timesimple telnet scenario
host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTT
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTTbull SampleRTT measured time
from segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
bull Exponential weighted moving averagebull influence of past sample decreases exponentially
fastbull typical value = 18th (or 0125)
Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety margin
bull First estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP reliable data transfer
bull TCP creates rdt service on top of IPrsquos unreliable service
bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs
bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control
TCP Sender EventsData rcvd from appbull Create segment with
seq bull seq is byte-stream
number of first data byte in segment
bull Start timer if not already running (think of timer as for oldest unACKed segment)
bull Expiration interval TimeOutInterval
Timeoutbull retransmit segment
that caused timeoutbull restart timer ACK rcvdbull If acknowledges
previously unACKed segmentsndash update what is known
to be ACKedndash start timer if there are
outstanding segments
TCP Sender
(simplified)
NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71
y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed
TCP Retransmission Scenarios
Host A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq
=92
tim
eout
ACK=120
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq
=92
tim
eout
SendBase= 100
SendBase= 120
SendBase= 120
Sendbase= 100
TCP Retransmission Scenarios (more)
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
Cumulative ACK scenario
Host B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
Fast Retransmit
bull Time-out period often relatively longndash Long delay before
resending lost packet
bull Detect lost segments via duplicate ACKsndash Sender often sends
many segments back-to-back
ndash If segment lost there will likely be many duplicate ACKs for that segment
bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
Host A
timeo
ut
Host B
time
X
resend seq X2
seq x1seq x2seq x3seq x4seq x5
ACK x1
ACK x1ACK x1ACK x1
tripleduplicate
ACKs
Fast
Retra
nsm
it
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Flow Control
bull Receive side of TCP connection has a receive buffer
bull speed-matching service matching send rate to receiving applicationrsquos drain rate
bull App process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast
flow control
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
TCP Flow Control How it Works
(suppose TCP receiver discards out-of-order segments)
bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Receiver advertises unused buffer space by including rwnd value in segment header
bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos
buffer doesnrsquot overflow
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
rwndRcvBuffer
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control
info (eg RcvWindow)bull client connection
initiator Socket clientSocket = new
Socket(ldquohostnamerdquo port) bull server contacted by
client Socket connectionSocket =
welcomeSocketaccept()
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial
seq Step 3 client receives
SYNACK replies with ACK segment which may contain data
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer 3-81
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Principles of Congestion Control
Congestionbull Informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull Different from flow controlbull Manifestations
ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)
bull A ldquotop-10rdquo problem
Causescosts of Congestion Scenario 1
bull Two senders two receivers
bull One router infinite buffers
bull No retransmission
bull Large delays when congested
bull Maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
Causescosts of Congestion Scenario 2
bull One router finite buffers bull Sender retransmission of lost packet
finite shared output link
buffers
Host A
lin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of congestion Scenario 2
bull Always (goodput)
bull ldquoPerfectrdquo retransmission only when loss
bull Retransmission of delayed (not lost) packet makes
larger (than perfect case) for same
lin
lout
=
lin
lout
gtl
inl
out
ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of
pkt
R2
R2lin
l ou
t
b
R2
R2lin
l ou
t
a
R2
R2lin
l ou
t
c
R4
R3
Causescosts of Congestion Scenario 3
bull Four sendersbull Multihop pathsbull Timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of Congestion Scenario 3
Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Host A
Host B
lo
u
t
Approaches towards congestion control
End-end congestion control
bull No explicit feedback from network
bull Congestion inferred from end-system observed loss delay
bull Approach taken by TCP
Network-assisted congestion control
bull Routers provide feedback to end systemsndash Single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash Explicit rate sender should send at
Broadly
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Congestion Control
bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level
bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)
network not congested so increase sending rate
- lost segment assume loss due to congested network so decrease sending rate
TCP Congestion Control Bandwidth Probing
bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since
available bandwidth is changing depending on other connections in network)
ACKs being received so increase rate
X
X
XX
X loss so decrease rate
send
ing
rate
time
bull Q how fast to increasedecrease- details to follow
TCPrsquosldquosawtoothrdquobehavior
TCP Congestion Control detailsbull sender limits rate by limiting
number of unACKed bytes ldquoin pipelinerdquo
ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)
bull roughly
bull cwnd is dynamic function of perceived network congestion
rate = cwnd
RTT bytessec
LastByteSent-LastByteAcked cwnd
cwndbytes
RTT
ACK(s)
TCP Congestion Control more details
segment loss event reducing cwnd
bull timeout no response from receiverndash cut cwnd to 1
bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less
aggressively than on timeout
ACK received increase cwnd
bull slowstart phase - increase exponentially
fast (despite name) at connection start or following timeout
bull congestion avoidance - increase linearly
TCP Slow Start
bull when connection begins cwnd = 1 MSSndash example MSS = 500
bytes amp RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp
up to respectable ratebull increase rate exponentially
until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transitioning intoout of slowstart
ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2
ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to
congestion avoidance phase
slow start timeout
ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed
new ACKdupACKcount++
duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion
avoidance
TCP Congestion Avoidance
bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1
MSS per RTT ndash approach possible
congestion slower than in slowstart
ndash implementation cwnd = cwnd + MSScwnd for each ACK received
bull ACKs increase cwnd by 1 MSS per RTT additive increase
bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease
AIMD
AIMD Additive IncreaseMultiplicative Decrease
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)
Rdt21 Receiver Handles Garbled ACKNAKs
Wait for 0 from below
sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq0(rcvpkt)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
ampamp has_seq1(rcvpkt) extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
Wait for 1 from below
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq0(rcvpkt)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq1(rcvpkt)
sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
Rdt21 Discussion
Senderbull seq added to pktbull two seq rsquos (01) will sufficebull must check if received ACKNAK corrupted bull twice as many states
ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq
Receiverbull must check if received packet is duplicate
ndash state indicates whether 0 or 1 is expected pkt seq
bull note receiver can not know if its last ACKNAK received OK at sender
Rdt22 a NAK-free Protocol
bull Reduce type of response ACK onlybull Same functionality as rdt21 using ACKs onlybull Instead of NAK receiver sends ACK for last
pkt received OKndash receiver must explicitly include seq of pkt being
ACKed
bull Duplicate ACK at sender results in same action as NAK retransmit current pkt
rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))
udt_send(sndpkt)
Rdt22 Sender amp Receiver Fragments
Wait for call 0 from
above
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)
rdt_send(data)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
Wait for ACK 0
sender FSMfragment
Wait for 0 from below
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)
receiver FSMfragment
L
Rdt30 Channels with Errors and Loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of
help but not enough
How to determine if a packet is lost
Rdt30 Channels with Errors and Loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of
help but not enough
Approach sender waits ldquoreasonablerdquo amount of time for ACK
bull Retransmits if no ACK received in this time
bull If pkt (or ACK) just delayed (not lost)ndash Retransmission will be
duplicate but use of seq rsquos already handles this
ndash Receiver must specify seq of pkt being ACKed
bull Requires countdown timer
Rdt30 Sender
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer
rdt_send(data)
Wait for
ACK0
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )
Wait for call 1 from
above
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer
rdt_send(data)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)
stop_timerstop_timer
udt_send(sndpkt)start_timer
timeout
udt_send(sndpkt)start_timer
timeout
rdt_rcv(rcvpkt)
Wait for call 0from
above
Wait for
ACK1
Lrdt_rcv(rcvpkt)
LL
L
Rdt30 in Action
Rdt30 in Action
Performance of Rdt30
bull Rdt30 works but performance stinkshellipbull ex 1 Gbps link 15 ms prop delay 8000 bit
packet
U sender utilization ndash fraction of time sender busy sending
U sender
= 008
30008 = 000027
microseconds
L R
RTT + L R =
bull 1KB pkt every 30 msec -gt 33kBsec throughput over 1 Gbps link
bull Network protocol limits use of physical resources
dsmicrosecon8bps10
bits80009
R
Ldtrans
(Picture next slide)
Rdt30 Stop-and-Wait Operation
first packet bit transmitted t = 0
sender
receiver
RTT
last packet bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
U sender
= 008
30008 = 000027
microseconds
L R
RTT + L R =
Pipelined Protocols
Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash Range of sequence numbers must be increasedndash Need buffering at sender andor receiver
Pipelining Increased Utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
U sender
= 024
30008 = 00008
microseconds
3 L R
RTT + L R =
Increase utilizationby a factor of 3
bull Two generic forms of pipelined protocols go-Back-N selective repeat
Pipelining Protocols
Go-back-N overviewbull sender up to N
unACKed pkts in pipeline
bull receiver only sends cumulative ACKsndash doesnrsquot ACK pkt if
therersquos a gap
bull sender has timer for oldest unACKed pktndash if timer expires
retransmit all unACKed packets
Selective Repeat overviewbull sender up to N
unACKed packets in pipeline
bull receiver ACKs individual pkts
bull sender maintains timer for each unACKed pktndash if timer expires
retransmit only unACKed packet
Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unACKed pkts allowed
bull ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)
bull Timer for each in-flight pktbull Timeout(n) retransmit pkt n and all higher seq pkts in
window
GBN Sender Extended FSM
Waitstart_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])
timeout
rdt_send(data)
if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)
base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
base=1nextseqnum=1
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
L
GBN Receiver Extended FSM
ACK-only always send ACK for correctly-received pkt with highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum
bull out-of-order pkt ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK pkt with highest in-order seq
Wait
udt_send(sndpkt)
default
rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++
expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)
L
GBN inaction
GBN Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsgo-back-nindexhtml
Selective Repeatbull Receiver individually acknowledges all
correctly received pktsndash Buffers pkts as needed for eventual in-order
delivery to upper layer
bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt
bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts
Selective Repeat sender receiver windows
Selective Repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next
unACKed seq
senderpkt n in [rcvbase rcvbase+N-
1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also
deliver buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1]
bull ACK(n)
otherwise bull ignore
receiver
Transport Layer 3-52
Selective Repeat in Action
Selective Repeat
DilemmaExample bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
SR Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Overview RFCs 793 1122 1323 2018 2581
bull full duplex datandash bi-directional data flow
in same connectionndash MSS maximum
segment size
bull connection-oriented ndash handshaking
(exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not
overwhelm receiver
bull point-to-pointndash one sender one
receiver
bull reliable in-order byte steamndash no ldquomessage
boundariesrdquo
bull pipelinedndash TCP congestion and
flow control set window size
bull send amp receive buffers
socketdoor
T C Psend buffer
T C Preceive buffer
socketdoor
segm ent
applicationwrites data
applicationreads data
TCP Segment Structure
source port dest port
32 bits
applicationdata (variable length)
sequence number
acknowledgement number
Receive window
Urg data pointerchecksumFSRPAU
headlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
TCP Seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next
byte expected from other side
ndash cumulative ACKQ how receiver
handles out-of-order segmentsndash A TCP spec
doesnrsquot say up to implementer
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
UsertypeslsquoCrsquo
host ACKsreceipt of echoedlsquoCrsquo
timesimple telnet scenario
host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTT
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTTbull SampleRTT measured time
from segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
bull Exponential weighted moving averagebull influence of past sample decreases exponentially
fastbull typical value = 18th (or 0125)
Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety margin
bull First estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP reliable data transfer
bull TCP creates rdt service on top of IPrsquos unreliable service
bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs
bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control
TCP Sender EventsData rcvd from appbull Create segment with
seq bull seq is byte-stream
number of first data byte in segment
bull Start timer if not already running (think of timer as for oldest unACKed segment)
bull Expiration interval TimeOutInterval
Timeoutbull retransmit segment
that caused timeoutbull restart timer ACK rcvdbull If acknowledges
previously unACKed segmentsndash update what is known
to be ACKedndash start timer if there are
outstanding segments
TCP Sender
(simplified)
NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71
y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed
TCP Retransmission Scenarios
Host A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq
=92
tim
eout
ACK=120
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq
=92
tim
eout
SendBase= 100
SendBase= 120
SendBase= 120
Sendbase= 100
TCP Retransmission Scenarios (more)
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
Cumulative ACK scenario
Host B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
Fast Retransmit
bull Time-out period often relatively longndash Long delay before
resending lost packet
bull Detect lost segments via duplicate ACKsndash Sender often sends
many segments back-to-back
ndash If segment lost there will likely be many duplicate ACKs for that segment
bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
Host A
timeo
ut
Host B
time
X
resend seq X2
seq x1seq x2seq x3seq x4seq x5
ACK x1
ACK x1ACK x1ACK x1
tripleduplicate
ACKs
Fast
Retra
nsm
it
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Flow Control
bull Receive side of TCP connection has a receive buffer
bull speed-matching service matching send rate to receiving applicationrsquos drain rate
bull App process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast
flow control
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
TCP Flow Control How it Works
(suppose TCP receiver discards out-of-order segments)
bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Receiver advertises unused buffer space by including rwnd value in segment header
bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos
buffer doesnrsquot overflow
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
rwndRcvBuffer
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control
info (eg RcvWindow)bull client connection
initiator Socket clientSocket = new
Socket(ldquohostnamerdquo port) bull server contacted by
client Socket connectionSocket =
welcomeSocketaccept()
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial
seq Step 3 client receives
SYNACK replies with ACK segment which may contain data
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer 3-81
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Principles of Congestion Control
Congestionbull Informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull Different from flow controlbull Manifestations
ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)
bull A ldquotop-10rdquo problem
Causescosts of Congestion Scenario 1
bull Two senders two receivers
bull One router infinite buffers
bull No retransmission
bull Large delays when congested
bull Maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
Causescosts of Congestion Scenario 2
bull One router finite buffers bull Sender retransmission of lost packet
finite shared output link
buffers
Host A
lin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of congestion Scenario 2
bull Always (goodput)
bull ldquoPerfectrdquo retransmission only when loss
bull Retransmission of delayed (not lost) packet makes
larger (than perfect case) for same
lin
lout
=
lin
lout
gtl
inl
out
ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of
pkt
R2
R2lin
l ou
t
b
R2
R2lin
l ou
t
a
R2
R2lin
l ou
t
c
R4
R3
Causescosts of Congestion Scenario 3
bull Four sendersbull Multihop pathsbull Timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of Congestion Scenario 3
Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Host A
Host B
lo
u
t
Approaches towards congestion control
End-end congestion control
bull No explicit feedback from network
bull Congestion inferred from end-system observed loss delay
bull Approach taken by TCP
Network-assisted congestion control
bull Routers provide feedback to end systemsndash Single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash Explicit rate sender should send at
Broadly
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Congestion Control
bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level
bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)
network not congested so increase sending rate
- lost segment assume loss due to congested network so decrease sending rate
TCP Congestion Control Bandwidth Probing
bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since
available bandwidth is changing depending on other connections in network)
ACKs being received so increase rate
X
X
XX
X loss so decrease rate
send
ing
rate
time
bull Q how fast to increasedecrease- details to follow
TCPrsquosldquosawtoothrdquobehavior
TCP Congestion Control detailsbull sender limits rate by limiting
number of unACKed bytes ldquoin pipelinerdquo
ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)
bull roughly
bull cwnd is dynamic function of perceived network congestion
rate = cwnd
RTT bytessec
LastByteSent-LastByteAcked cwnd
cwndbytes
RTT
ACK(s)
TCP Congestion Control more details
segment loss event reducing cwnd
bull timeout no response from receiverndash cut cwnd to 1
bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less
aggressively than on timeout
ACK received increase cwnd
bull slowstart phase - increase exponentially
fast (despite name) at connection start or following timeout
bull congestion avoidance - increase linearly
TCP Slow Start
bull when connection begins cwnd = 1 MSSndash example MSS = 500
bytes amp RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp
up to respectable ratebull increase rate exponentially
until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transitioning intoout of slowstart
ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2
ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to
congestion avoidance phase
slow start timeout
ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed
new ACKdupACKcount++
duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion
avoidance
TCP Congestion Avoidance
bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1
MSS per RTT ndash approach possible
congestion slower than in slowstart
ndash implementation cwnd = cwnd + MSScwnd for each ACK received
bull ACKs increase cwnd by 1 MSS per RTT additive increase
bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease
AIMD
AIMD Additive IncreaseMultiplicative Decrease
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
Rdt21 Discussion
Senderbull seq added to pktbull two seq rsquos (01) will sufficebull must check if received ACKNAK corrupted bull twice as many states
ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq
Receiverbull must check if received packet is duplicate
ndash state indicates whether 0 or 1 is expected pkt seq
bull note receiver can not know if its last ACKNAK received OK at sender
Rdt22 a NAK-free Protocol
bull Reduce type of response ACK onlybull Same functionality as rdt21 using ACKs onlybull Instead of NAK receiver sends ACK for last
pkt received OKndash receiver must explicitly include seq of pkt being
ACKed
bull Duplicate ACK at sender results in same action as NAK retransmit current pkt
rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))
udt_send(sndpkt)
Rdt22 Sender amp Receiver Fragments
Wait for call 0 from
above
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)
rdt_send(data)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
Wait for ACK 0
sender FSMfragment
Wait for 0 from below
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)
receiver FSMfragment
L
Rdt30 Channels with Errors and Loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of
help but not enough
How to determine if a packet is lost
Rdt30 Channels with Errors and Loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of
help but not enough
Approach sender waits ldquoreasonablerdquo amount of time for ACK
bull Retransmits if no ACK received in this time
bull If pkt (or ACK) just delayed (not lost)ndash Retransmission will be
duplicate but use of seq rsquos already handles this
ndash Receiver must specify seq of pkt being ACKed
bull Requires countdown timer
Rdt30 Sender
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer
rdt_send(data)
Wait for
ACK0
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )
Wait for call 1 from
above
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer
rdt_send(data)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)
stop_timerstop_timer
udt_send(sndpkt)start_timer
timeout
udt_send(sndpkt)start_timer
timeout
rdt_rcv(rcvpkt)
Wait for call 0from
above
Wait for
ACK1
Lrdt_rcv(rcvpkt)
LL
L
Rdt30 in Action
Rdt30 in Action
Performance of Rdt30
bull Rdt30 works but performance stinkshellipbull ex 1 Gbps link 15 ms prop delay 8000 bit
packet
U sender utilization ndash fraction of time sender busy sending
U sender
= 008
30008 = 000027
microseconds
L R
RTT + L R =
bull 1KB pkt every 30 msec -gt 33kBsec throughput over 1 Gbps link
bull Network protocol limits use of physical resources
dsmicrosecon8bps10
bits80009
R
Ldtrans
(Picture next slide)
Rdt30 Stop-and-Wait Operation
first packet bit transmitted t = 0
sender
receiver
RTT
last packet bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
U sender
= 008
30008 = 000027
microseconds
L R
RTT + L R =
Pipelined Protocols
Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash Range of sequence numbers must be increasedndash Need buffering at sender andor receiver
Pipelining Increased Utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
U sender
= 024
30008 = 00008
microseconds
3 L R
RTT + L R =
Increase utilizationby a factor of 3
bull Two generic forms of pipelined protocols go-Back-N selective repeat
Pipelining Protocols
Go-back-N overviewbull sender up to N
unACKed pkts in pipeline
bull receiver only sends cumulative ACKsndash doesnrsquot ACK pkt if
therersquos a gap
bull sender has timer for oldest unACKed pktndash if timer expires
retransmit all unACKed packets
Selective Repeat overviewbull sender up to N
unACKed packets in pipeline
bull receiver ACKs individual pkts
bull sender maintains timer for each unACKed pktndash if timer expires
retransmit only unACKed packet
Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unACKed pkts allowed
bull ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)
bull Timer for each in-flight pktbull Timeout(n) retransmit pkt n and all higher seq pkts in
window
GBN Sender Extended FSM
Waitstart_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])
timeout
rdt_send(data)
if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)
base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
base=1nextseqnum=1
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
L
GBN Receiver Extended FSM
ACK-only always send ACK for correctly-received pkt with highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum
bull out-of-order pkt ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK pkt with highest in-order seq
Wait
udt_send(sndpkt)
default
rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++
expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)
L
GBN inaction
GBN Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsgo-back-nindexhtml
Selective Repeatbull Receiver individually acknowledges all
correctly received pktsndash Buffers pkts as needed for eventual in-order
delivery to upper layer
bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt
bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts
Selective Repeat sender receiver windows
Selective Repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next
unACKed seq
senderpkt n in [rcvbase rcvbase+N-
1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also
deliver buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1]
bull ACK(n)
otherwise bull ignore
receiver
Transport Layer 3-52
Selective Repeat in Action
Selective Repeat
DilemmaExample bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
SR Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Overview RFCs 793 1122 1323 2018 2581
bull full duplex datandash bi-directional data flow
in same connectionndash MSS maximum
segment size
bull connection-oriented ndash handshaking
(exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not
overwhelm receiver
bull point-to-pointndash one sender one
receiver
bull reliable in-order byte steamndash no ldquomessage
boundariesrdquo
bull pipelinedndash TCP congestion and
flow control set window size
bull send amp receive buffers
socketdoor
T C Psend buffer
T C Preceive buffer
socketdoor
segm ent
applicationwrites data
applicationreads data
TCP Segment Structure
source port dest port
32 bits
applicationdata (variable length)
sequence number
acknowledgement number
Receive window
Urg data pointerchecksumFSRPAU
headlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
TCP Seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next
byte expected from other side
ndash cumulative ACKQ how receiver
handles out-of-order segmentsndash A TCP spec
doesnrsquot say up to implementer
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
UsertypeslsquoCrsquo
host ACKsreceipt of echoedlsquoCrsquo
timesimple telnet scenario
host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTT
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTTbull SampleRTT measured time
from segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
bull Exponential weighted moving averagebull influence of past sample decreases exponentially
fastbull typical value = 18th (or 0125)
Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety margin
bull First estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP reliable data transfer
bull TCP creates rdt service on top of IPrsquos unreliable service
bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs
bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control
TCP Sender EventsData rcvd from appbull Create segment with
seq bull seq is byte-stream
number of first data byte in segment
bull Start timer if not already running (think of timer as for oldest unACKed segment)
bull Expiration interval TimeOutInterval
Timeoutbull retransmit segment
that caused timeoutbull restart timer ACK rcvdbull If acknowledges
previously unACKed segmentsndash update what is known
to be ACKedndash start timer if there are
outstanding segments
TCP Sender
(simplified)
NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71
y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed
TCP Retransmission Scenarios
Host A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq
=92
tim
eout
ACK=120
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq
=92
tim
eout
SendBase= 100
SendBase= 120
SendBase= 120
Sendbase= 100
TCP Retransmission Scenarios (more)
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
Cumulative ACK scenario
Host B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
Fast Retransmit
bull Time-out period often relatively longndash Long delay before
resending lost packet
bull Detect lost segments via duplicate ACKsndash Sender often sends
many segments back-to-back
ndash If segment lost there will likely be many duplicate ACKs for that segment
bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
Host A
timeo
ut
Host B
time
X
resend seq X2
seq x1seq x2seq x3seq x4seq x5
ACK x1
ACK x1ACK x1ACK x1
tripleduplicate
ACKs
Fast
Retra
nsm
it
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Flow Control
bull Receive side of TCP connection has a receive buffer
bull speed-matching service matching send rate to receiving applicationrsquos drain rate
bull App process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast
flow control
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
TCP Flow Control How it Works
(suppose TCP receiver discards out-of-order segments)
bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Receiver advertises unused buffer space by including rwnd value in segment header
bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos
buffer doesnrsquot overflow
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
rwndRcvBuffer
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control
info (eg RcvWindow)bull client connection
initiator Socket clientSocket = new
Socket(ldquohostnamerdquo port) bull server contacted by
client Socket connectionSocket =
welcomeSocketaccept()
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial
seq Step 3 client receives
SYNACK replies with ACK segment which may contain data
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer 3-81
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Principles of Congestion Control
Congestionbull Informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull Different from flow controlbull Manifestations
ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)
bull A ldquotop-10rdquo problem
Causescosts of Congestion Scenario 1
bull Two senders two receivers
bull One router infinite buffers
bull No retransmission
bull Large delays when congested
bull Maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
Causescosts of Congestion Scenario 2
bull One router finite buffers bull Sender retransmission of lost packet
finite shared output link
buffers
Host A
lin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of congestion Scenario 2
bull Always (goodput)
bull ldquoPerfectrdquo retransmission only when loss
bull Retransmission of delayed (not lost) packet makes
larger (than perfect case) for same
lin
lout
=
lin
lout
gtl
inl
out
ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of
pkt
R2
R2lin
l ou
t
b
R2
R2lin
l ou
t
a
R2
R2lin
l ou
t
c
R4
R3
Causescosts of Congestion Scenario 3
bull Four sendersbull Multihop pathsbull Timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of Congestion Scenario 3
Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Host A
Host B
lo
u
t
Approaches towards congestion control
End-end congestion control
bull No explicit feedback from network
bull Congestion inferred from end-system observed loss delay
bull Approach taken by TCP
Network-assisted congestion control
bull Routers provide feedback to end systemsndash Single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash Explicit rate sender should send at
Broadly
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Congestion Control
bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level
bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)
network not congested so increase sending rate
- lost segment assume loss due to congested network so decrease sending rate
TCP Congestion Control Bandwidth Probing
bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since
available bandwidth is changing depending on other connections in network)
ACKs being received so increase rate
X
X
XX
X loss so decrease rate
send
ing
rate
time
bull Q how fast to increasedecrease- details to follow
TCPrsquosldquosawtoothrdquobehavior
TCP Congestion Control detailsbull sender limits rate by limiting
number of unACKed bytes ldquoin pipelinerdquo
ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)
bull roughly
bull cwnd is dynamic function of perceived network congestion
rate = cwnd
RTT bytessec
LastByteSent-LastByteAcked cwnd
cwndbytes
RTT
ACK(s)
TCP Congestion Control more details
segment loss event reducing cwnd
bull timeout no response from receiverndash cut cwnd to 1
bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less
aggressively than on timeout
ACK received increase cwnd
bull slowstart phase - increase exponentially
fast (despite name) at connection start or following timeout
bull congestion avoidance - increase linearly
TCP Slow Start
bull when connection begins cwnd = 1 MSSndash example MSS = 500
bytes amp RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp
up to respectable ratebull increase rate exponentially
until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transitioning intoout of slowstart
ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2
ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to
congestion avoidance phase
slow start timeout
ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed
new ACKdupACKcount++
duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion
avoidance
TCP Congestion Avoidance
bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1
MSS per RTT ndash approach possible
congestion slower than in slowstart
ndash implementation cwnd = cwnd + MSScwnd for each ACK received
bull ACKs increase cwnd by 1 MSS per RTT additive increase
bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease
AIMD
AIMD Additive IncreaseMultiplicative Decrease
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
Rdt22 a NAK-free Protocol
bull Reduce type of response ACK onlybull Same functionality as rdt21 using ACKs onlybull Instead of NAK receiver sends ACK for last
pkt received OKndash receiver must explicitly include seq of pkt being
ACKed
bull Duplicate ACK at sender results in same action as NAK retransmit current pkt
rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))
udt_send(sndpkt)
Rdt22 Sender amp Receiver Fragments
Wait for call 0 from
above
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)
rdt_send(data)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
Wait for ACK 0
sender FSMfragment
Wait for 0 from below
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)
receiver FSMfragment
L
Rdt30 Channels with Errors and Loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of
help but not enough
How to determine if a packet is lost
Rdt30 Channels with Errors and Loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of
help but not enough
Approach sender waits ldquoreasonablerdquo amount of time for ACK
bull Retransmits if no ACK received in this time
bull If pkt (or ACK) just delayed (not lost)ndash Retransmission will be
duplicate but use of seq rsquos already handles this
ndash Receiver must specify seq of pkt being ACKed
bull Requires countdown timer
Rdt30 Sender
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer
rdt_send(data)
Wait for
ACK0
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )
Wait for call 1 from
above
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer
rdt_send(data)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)
stop_timerstop_timer
udt_send(sndpkt)start_timer
timeout
udt_send(sndpkt)start_timer
timeout
rdt_rcv(rcvpkt)
Wait for call 0from
above
Wait for
ACK1
Lrdt_rcv(rcvpkt)
LL
L
Rdt30 in Action
Rdt30 in Action
Performance of Rdt30
bull Rdt30 works but performance stinkshellipbull ex 1 Gbps link 15 ms prop delay 8000 bit
packet
U sender utilization ndash fraction of time sender busy sending
U sender
= 008
30008 = 000027
microseconds
L R
RTT + L R =
bull 1KB pkt every 30 msec -gt 33kBsec throughput over 1 Gbps link
bull Network protocol limits use of physical resources
dsmicrosecon8bps10
bits80009
R
Ldtrans
(Picture next slide)
Rdt30 Stop-and-Wait Operation
first packet bit transmitted t = 0
sender
receiver
RTT
last packet bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
U sender
= 008
30008 = 000027
microseconds
L R
RTT + L R =
Pipelined Protocols
Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash Range of sequence numbers must be increasedndash Need buffering at sender andor receiver
Pipelining Increased Utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
U sender
= 024
30008 = 00008
microseconds
3 L R
RTT + L R =
Increase utilizationby a factor of 3
bull Two generic forms of pipelined protocols go-Back-N selective repeat
Pipelining Protocols
Go-back-N overviewbull sender up to N
unACKed pkts in pipeline
bull receiver only sends cumulative ACKsndash doesnrsquot ACK pkt if
therersquos a gap
bull sender has timer for oldest unACKed pktndash if timer expires
retransmit all unACKed packets
Selective Repeat overviewbull sender up to N
unACKed packets in pipeline
bull receiver ACKs individual pkts
bull sender maintains timer for each unACKed pktndash if timer expires
retransmit only unACKed packet
Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unACKed pkts allowed
bull ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)
bull Timer for each in-flight pktbull Timeout(n) retransmit pkt n and all higher seq pkts in
window
GBN Sender Extended FSM
Waitstart_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])
timeout
rdt_send(data)
if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)
base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
base=1nextseqnum=1
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
L
GBN Receiver Extended FSM
ACK-only always send ACK for correctly-received pkt with highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum
bull out-of-order pkt ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK pkt with highest in-order seq
Wait
udt_send(sndpkt)
default
rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++
expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)
L
GBN inaction
GBN Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsgo-back-nindexhtml
Selective Repeatbull Receiver individually acknowledges all
correctly received pktsndash Buffers pkts as needed for eventual in-order
delivery to upper layer
bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt
bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts
Selective Repeat sender receiver windows
Selective Repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next
unACKed seq
senderpkt n in [rcvbase rcvbase+N-
1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also
deliver buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1]
bull ACK(n)
otherwise bull ignore
receiver
Transport Layer 3-52
Selective Repeat in Action
Selective Repeat
DilemmaExample bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
SR Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Overview RFCs 793 1122 1323 2018 2581
bull full duplex datandash bi-directional data flow
in same connectionndash MSS maximum
segment size
bull connection-oriented ndash handshaking
(exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not
overwhelm receiver
bull point-to-pointndash one sender one
receiver
bull reliable in-order byte steamndash no ldquomessage
boundariesrdquo
bull pipelinedndash TCP congestion and
flow control set window size
bull send amp receive buffers
socketdoor
T C Psend buffer
T C Preceive buffer
socketdoor
segm ent
applicationwrites data
applicationreads data
TCP Segment Structure
source port dest port
32 bits
applicationdata (variable length)
sequence number
acknowledgement number
Receive window
Urg data pointerchecksumFSRPAU
headlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
TCP Seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next
byte expected from other side
ndash cumulative ACKQ how receiver
handles out-of-order segmentsndash A TCP spec
doesnrsquot say up to implementer
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
UsertypeslsquoCrsquo
host ACKsreceipt of echoedlsquoCrsquo
timesimple telnet scenario
host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTT
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTTbull SampleRTT measured time
from segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
bull Exponential weighted moving averagebull influence of past sample decreases exponentially
fastbull typical value = 18th (or 0125)
Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety margin
bull First estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP reliable data transfer
bull TCP creates rdt service on top of IPrsquos unreliable service
bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs
bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control
TCP Sender EventsData rcvd from appbull Create segment with
seq bull seq is byte-stream
number of first data byte in segment
bull Start timer if not already running (think of timer as for oldest unACKed segment)
bull Expiration interval TimeOutInterval
Timeoutbull retransmit segment
that caused timeoutbull restart timer ACK rcvdbull If acknowledges
previously unACKed segmentsndash update what is known
to be ACKedndash start timer if there are
outstanding segments
TCP Sender
(simplified)
NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71
y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed
TCP Retransmission Scenarios
Host A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq
=92
tim
eout
ACK=120
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq
=92
tim
eout
SendBase= 100
SendBase= 120
SendBase= 120
Sendbase= 100
TCP Retransmission Scenarios (more)
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
Cumulative ACK scenario
Host B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
Fast Retransmit
bull Time-out period often relatively longndash Long delay before
resending lost packet
bull Detect lost segments via duplicate ACKsndash Sender often sends
many segments back-to-back
ndash If segment lost there will likely be many duplicate ACKs for that segment
bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
Host A
timeo
ut
Host B
time
X
resend seq X2
seq x1seq x2seq x3seq x4seq x5
ACK x1
ACK x1ACK x1ACK x1
tripleduplicate
ACKs
Fast
Retra
nsm
it
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Flow Control
bull Receive side of TCP connection has a receive buffer
bull speed-matching service matching send rate to receiving applicationrsquos drain rate
bull App process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast
flow control
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
TCP Flow Control How it Works
(suppose TCP receiver discards out-of-order segments)
bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Receiver advertises unused buffer space by including rwnd value in segment header
bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos
buffer doesnrsquot overflow
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
rwndRcvBuffer
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control
info (eg RcvWindow)bull client connection
initiator Socket clientSocket = new
Socket(ldquohostnamerdquo port) bull server contacted by
client Socket connectionSocket =
welcomeSocketaccept()
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial
seq Step 3 client receives
SYNACK replies with ACK segment which may contain data
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer 3-81
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Principles of Congestion Control
Congestionbull Informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull Different from flow controlbull Manifestations
ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)
bull A ldquotop-10rdquo problem
Causescosts of Congestion Scenario 1
bull Two senders two receivers
bull One router infinite buffers
bull No retransmission
bull Large delays when congested
bull Maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
Causescosts of Congestion Scenario 2
bull One router finite buffers bull Sender retransmission of lost packet
finite shared output link
buffers
Host A
lin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of congestion Scenario 2
bull Always (goodput)
bull ldquoPerfectrdquo retransmission only when loss
bull Retransmission of delayed (not lost) packet makes
larger (than perfect case) for same
lin
lout
=
lin
lout
gtl
inl
out
ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of
pkt
R2
R2lin
l ou
t
b
R2
R2lin
l ou
t
a
R2
R2lin
l ou
t
c
R4
R3
Causescosts of Congestion Scenario 3
bull Four sendersbull Multihop pathsbull Timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of Congestion Scenario 3
Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Host A
Host B
lo
u
t
Approaches towards congestion control
End-end congestion control
bull No explicit feedback from network
bull Congestion inferred from end-system observed loss delay
bull Approach taken by TCP
Network-assisted congestion control
bull Routers provide feedback to end systemsndash Single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash Explicit rate sender should send at
Broadly
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Congestion Control
bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level
bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)
network not congested so increase sending rate
- lost segment assume loss due to congested network so decrease sending rate
TCP Congestion Control Bandwidth Probing
bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since
available bandwidth is changing depending on other connections in network)
ACKs being received so increase rate
X
X
XX
X loss so decrease rate
send
ing
rate
time
bull Q how fast to increasedecrease- details to follow
TCPrsquosldquosawtoothrdquobehavior
TCP Congestion Control detailsbull sender limits rate by limiting
number of unACKed bytes ldquoin pipelinerdquo
ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)
bull roughly
bull cwnd is dynamic function of perceived network congestion
rate = cwnd
RTT bytessec
LastByteSent-LastByteAcked cwnd
cwndbytes
RTT
ACK(s)
TCP Congestion Control more details
segment loss event reducing cwnd
bull timeout no response from receiverndash cut cwnd to 1
bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less
aggressively than on timeout
ACK received increase cwnd
bull slowstart phase - increase exponentially
fast (despite name) at connection start or following timeout
bull congestion avoidance - increase linearly
TCP Slow Start
bull when connection begins cwnd = 1 MSSndash example MSS = 500
bytes amp RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp
up to respectable ratebull increase rate exponentially
until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transitioning intoout of slowstart
ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2
ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to
congestion avoidance phase
slow start timeout
ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed
new ACKdupACKcount++
duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion
avoidance
TCP Congestion Avoidance
bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1
MSS per RTT ndash approach possible
congestion slower than in slowstart
ndash implementation cwnd = cwnd + MSScwnd for each ACK received
bull ACKs increase cwnd by 1 MSS per RTT additive increase
bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease
AIMD
AIMD Additive IncreaseMultiplicative Decrease
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))
udt_send(sndpkt)
Rdt22 Sender amp Receiver Fragments
Wait for call 0 from
above
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)
rdt_send(data)
udt_send(sndpkt)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
Wait for ACK 0
sender FSMfragment
Wait for 0 from below
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)
receiver FSMfragment
L
Rdt30 Channels with Errors and Loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of
help but not enough
How to determine if a packet is lost
Rdt30 Channels with Errors and Loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of
help but not enough
Approach sender waits ldquoreasonablerdquo amount of time for ACK
bull Retransmits if no ACK received in this time
bull If pkt (or ACK) just delayed (not lost)ndash Retransmission will be
duplicate but use of seq rsquos already handles this
ndash Receiver must specify seq of pkt being ACKed
bull Requires countdown timer
Rdt30 Sender
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer
rdt_send(data)
Wait for
ACK0
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )
Wait for call 1 from
above
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer
rdt_send(data)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)
stop_timerstop_timer
udt_send(sndpkt)start_timer
timeout
udt_send(sndpkt)start_timer
timeout
rdt_rcv(rcvpkt)
Wait for call 0from
above
Wait for
ACK1
Lrdt_rcv(rcvpkt)
LL
L
Rdt30 in Action
Rdt30 in Action
Performance of Rdt30
bull Rdt30 works but performance stinkshellipbull ex 1 Gbps link 15 ms prop delay 8000 bit
packet
U sender utilization ndash fraction of time sender busy sending
U sender
= 008
30008 = 000027
microseconds
L R
RTT + L R =
bull 1KB pkt every 30 msec -gt 33kBsec throughput over 1 Gbps link
bull Network protocol limits use of physical resources
dsmicrosecon8bps10
bits80009
R
Ldtrans
(Picture next slide)
Rdt30 Stop-and-Wait Operation
first packet bit transmitted t = 0
sender
receiver
RTT
last packet bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
U sender
= 008
30008 = 000027
microseconds
L R
RTT + L R =
Pipelined Protocols
Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash Range of sequence numbers must be increasedndash Need buffering at sender andor receiver
Pipelining Increased Utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
U sender
= 024
30008 = 00008
microseconds
3 L R
RTT + L R =
Increase utilizationby a factor of 3
bull Two generic forms of pipelined protocols go-Back-N selective repeat
Pipelining Protocols
Go-back-N overviewbull sender up to N
unACKed pkts in pipeline
bull receiver only sends cumulative ACKsndash doesnrsquot ACK pkt if
therersquos a gap
bull sender has timer for oldest unACKed pktndash if timer expires
retransmit all unACKed packets
Selective Repeat overviewbull sender up to N
unACKed packets in pipeline
bull receiver ACKs individual pkts
bull sender maintains timer for each unACKed pktndash if timer expires
retransmit only unACKed packet
Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unACKed pkts allowed
bull ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)
bull Timer for each in-flight pktbull Timeout(n) retransmit pkt n and all higher seq pkts in
window
GBN Sender Extended FSM
Waitstart_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])
timeout
rdt_send(data)
if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)
base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
base=1nextseqnum=1
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
L
GBN Receiver Extended FSM
ACK-only always send ACK for correctly-received pkt with highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum
bull out-of-order pkt ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK pkt with highest in-order seq
Wait
udt_send(sndpkt)
default
rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++
expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)
L
GBN inaction
GBN Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsgo-back-nindexhtml
Selective Repeatbull Receiver individually acknowledges all
correctly received pktsndash Buffers pkts as needed for eventual in-order
delivery to upper layer
bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt
bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts
Selective Repeat sender receiver windows
Selective Repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next
unACKed seq
senderpkt n in [rcvbase rcvbase+N-
1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also
deliver buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1]
bull ACK(n)
otherwise bull ignore
receiver
Transport Layer 3-52
Selective Repeat in Action
Selective Repeat
DilemmaExample bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
SR Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Overview RFCs 793 1122 1323 2018 2581
bull full duplex datandash bi-directional data flow
in same connectionndash MSS maximum
segment size
bull connection-oriented ndash handshaking
(exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not
overwhelm receiver
bull point-to-pointndash one sender one
receiver
bull reliable in-order byte steamndash no ldquomessage
boundariesrdquo
bull pipelinedndash TCP congestion and
flow control set window size
bull send amp receive buffers
socketdoor
T C Psend buffer
T C Preceive buffer
socketdoor
segm ent
applicationwrites data
applicationreads data
TCP Segment Structure
source port dest port
32 bits
applicationdata (variable length)
sequence number
acknowledgement number
Receive window
Urg data pointerchecksumFSRPAU
headlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
TCP Seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next
byte expected from other side
ndash cumulative ACKQ how receiver
handles out-of-order segmentsndash A TCP spec
doesnrsquot say up to implementer
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
UsertypeslsquoCrsquo
host ACKsreceipt of echoedlsquoCrsquo
timesimple telnet scenario
host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTT
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTTbull SampleRTT measured time
from segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
bull Exponential weighted moving averagebull influence of past sample decreases exponentially
fastbull typical value = 18th (or 0125)
Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety margin
bull First estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP reliable data transfer
bull TCP creates rdt service on top of IPrsquos unreliable service
bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs
bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control
TCP Sender EventsData rcvd from appbull Create segment with
seq bull seq is byte-stream
number of first data byte in segment
bull Start timer if not already running (think of timer as for oldest unACKed segment)
bull Expiration interval TimeOutInterval
Timeoutbull retransmit segment
that caused timeoutbull restart timer ACK rcvdbull If acknowledges
previously unACKed segmentsndash update what is known
to be ACKedndash start timer if there are
outstanding segments
TCP Sender
(simplified)
NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71
y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed
TCP Retransmission Scenarios
Host A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq
=92
tim
eout
ACK=120
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq
=92
tim
eout
SendBase= 100
SendBase= 120
SendBase= 120
Sendbase= 100
TCP Retransmission Scenarios (more)
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
Cumulative ACK scenario
Host B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
Fast Retransmit
bull Time-out period often relatively longndash Long delay before
resending lost packet
bull Detect lost segments via duplicate ACKsndash Sender often sends
many segments back-to-back
ndash If segment lost there will likely be many duplicate ACKs for that segment
bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
Host A
timeo
ut
Host B
time
X
resend seq X2
seq x1seq x2seq x3seq x4seq x5
ACK x1
ACK x1ACK x1ACK x1
tripleduplicate
ACKs
Fast
Retra
nsm
it
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Flow Control
bull Receive side of TCP connection has a receive buffer
bull speed-matching service matching send rate to receiving applicationrsquos drain rate
bull App process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast
flow control
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
TCP Flow Control How it Works
(suppose TCP receiver discards out-of-order segments)
bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Receiver advertises unused buffer space by including rwnd value in segment header
bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos
buffer doesnrsquot overflow
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
rwndRcvBuffer
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control
info (eg RcvWindow)bull client connection
initiator Socket clientSocket = new
Socket(ldquohostnamerdquo port) bull server contacted by
client Socket connectionSocket =
welcomeSocketaccept()
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial
seq Step 3 client receives
SYNACK replies with ACK segment which may contain data
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer 3-81
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Principles of Congestion Control
Congestionbull Informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull Different from flow controlbull Manifestations
ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)
bull A ldquotop-10rdquo problem
Causescosts of Congestion Scenario 1
bull Two senders two receivers
bull One router infinite buffers
bull No retransmission
bull Large delays when congested
bull Maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
Causescosts of Congestion Scenario 2
bull One router finite buffers bull Sender retransmission of lost packet
finite shared output link
buffers
Host A
lin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of congestion Scenario 2
bull Always (goodput)
bull ldquoPerfectrdquo retransmission only when loss
bull Retransmission of delayed (not lost) packet makes
larger (than perfect case) for same
lin
lout
=
lin
lout
gtl
inl
out
ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of
pkt
R2
R2lin
l ou
t
b
R2
R2lin
l ou
t
a
R2
R2lin
l ou
t
c
R4
R3
Causescosts of Congestion Scenario 3
bull Four sendersbull Multihop pathsbull Timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of Congestion Scenario 3
Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Host A
Host B
lo
u
t
Approaches towards congestion control
End-end congestion control
bull No explicit feedback from network
bull Congestion inferred from end-system observed loss delay
bull Approach taken by TCP
Network-assisted congestion control
bull Routers provide feedback to end systemsndash Single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash Explicit rate sender should send at
Broadly
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Congestion Control
bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level
bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)
network not congested so increase sending rate
- lost segment assume loss due to congested network so decrease sending rate
TCP Congestion Control Bandwidth Probing
bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since
available bandwidth is changing depending on other connections in network)
ACKs being received so increase rate
X
X
XX
X loss so decrease rate
send
ing
rate
time
bull Q how fast to increasedecrease- details to follow
TCPrsquosldquosawtoothrdquobehavior
TCP Congestion Control detailsbull sender limits rate by limiting
number of unACKed bytes ldquoin pipelinerdquo
ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)
bull roughly
bull cwnd is dynamic function of perceived network congestion
rate = cwnd
RTT bytessec
LastByteSent-LastByteAcked cwnd
cwndbytes
RTT
ACK(s)
TCP Congestion Control more details
segment loss event reducing cwnd
bull timeout no response from receiverndash cut cwnd to 1
bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less
aggressively than on timeout
ACK received increase cwnd
bull slowstart phase - increase exponentially
fast (despite name) at connection start or following timeout
bull congestion avoidance - increase linearly
TCP Slow Start
bull when connection begins cwnd = 1 MSSndash example MSS = 500
bytes amp RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp
up to respectable ratebull increase rate exponentially
until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transitioning intoout of slowstart
ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2
ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to
congestion avoidance phase
slow start timeout
ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed
new ACKdupACKcount++
duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion
avoidance
TCP Congestion Avoidance
bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1
MSS per RTT ndash approach possible
congestion slower than in slowstart
ndash implementation cwnd = cwnd + MSScwnd for each ACK received
bull ACKs increase cwnd by 1 MSS per RTT additive increase
bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease
AIMD
AIMD Additive IncreaseMultiplicative Decrease
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
Rdt30 Channels with Errors and Loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of
help but not enough
How to determine if a packet is lost
Rdt30 Channels with Errors and Loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of
help but not enough
Approach sender waits ldquoreasonablerdquo amount of time for ACK
bull Retransmits if no ACK received in this time
bull If pkt (or ACK) just delayed (not lost)ndash Retransmission will be
duplicate but use of seq rsquos already handles this
ndash Receiver must specify seq of pkt being ACKed
bull Requires countdown timer
Rdt30 Sender
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer
rdt_send(data)
Wait for
ACK0
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )
Wait for call 1 from
above
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer
rdt_send(data)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)
stop_timerstop_timer
udt_send(sndpkt)start_timer
timeout
udt_send(sndpkt)start_timer
timeout
rdt_rcv(rcvpkt)
Wait for call 0from
above
Wait for
ACK1
Lrdt_rcv(rcvpkt)
LL
L
Rdt30 in Action
Rdt30 in Action
Performance of Rdt30
bull Rdt30 works but performance stinkshellipbull ex 1 Gbps link 15 ms prop delay 8000 bit
packet
U sender utilization ndash fraction of time sender busy sending
U sender
= 008
30008 = 000027
microseconds
L R
RTT + L R =
bull 1KB pkt every 30 msec -gt 33kBsec throughput over 1 Gbps link
bull Network protocol limits use of physical resources
dsmicrosecon8bps10
bits80009
R
Ldtrans
(Picture next slide)
Rdt30 Stop-and-Wait Operation
first packet bit transmitted t = 0
sender
receiver
RTT
last packet bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
U sender
= 008
30008 = 000027
microseconds
L R
RTT + L R =
Pipelined Protocols
Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash Range of sequence numbers must be increasedndash Need buffering at sender andor receiver
Pipelining Increased Utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
U sender
= 024
30008 = 00008
microseconds
3 L R
RTT + L R =
Increase utilizationby a factor of 3
bull Two generic forms of pipelined protocols go-Back-N selective repeat
Pipelining Protocols
Go-back-N overviewbull sender up to N
unACKed pkts in pipeline
bull receiver only sends cumulative ACKsndash doesnrsquot ACK pkt if
therersquos a gap
bull sender has timer for oldest unACKed pktndash if timer expires
retransmit all unACKed packets
Selective Repeat overviewbull sender up to N
unACKed packets in pipeline
bull receiver ACKs individual pkts
bull sender maintains timer for each unACKed pktndash if timer expires
retransmit only unACKed packet
Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unACKed pkts allowed
bull ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)
bull Timer for each in-flight pktbull Timeout(n) retransmit pkt n and all higher seq pkts in
window
GBN Sender Extended FSM
Waitstart_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])
timeout
rdt_send(data)
if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)
base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
base=1nextseqnum=1
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
L
GBN Receiver Extended FSM
ACK-only always send ACK for correctly-received pkt with highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum
bull out-of-order pkt ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK pkt with highest in-order seq
Wait
udt_send(sndpkt)
default
rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++
expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)
L
GBN inaction
GBN Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsgo-back-nindexhtml
Selective Repeatbull Receiver individually acknowledges all
correctly received pktsndash Buffers pkts as needed for eventual in-order
delivery to upper layer
bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt
bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts
Selective Repeat sender receiver windows
Selective Repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next
unACKed seq
senderpkt n in [rcvbase rcvbase+N-
1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also
deliver buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1]
bull ACK(n)
otherwise bull ignore
receiver
Transport Layer 3-52
Selective Repeat in Action
Selective Repeat
DilemmaExample bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
SR Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Overview RFCs 793 1122 1323 2018 2581
bull full duplex datandash bi-directional data flow
in same connectionndash MSS maximum
segment size
bull connection-oriented ndash handshaking
(exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not
overwhelm receiver
bull point-to-pointndash one sender one
receiver
bull reliable in-order byte steamndash no ldquomessage
boundariesrdquo
bull pipelinedndash TCP congestion and
flow control set window size
bull send amp receive buffers
socketdoor
T C Psend buffer
T C Preceive buffer
socketdoor
segm ent
applicationwrites data
applicationreads data
TCP Segment Structure
source port dest port
32 bits
applicationdata (variable length)
sequence number
acknowledgement number
Receive window
Urg data pointerchecksumFSRPAU
headlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
TCP Seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next
byte expected from other side
ndash cumulative ACKQ how receiver
handles out-of-order segmentsndash A TCP spec
doesnrsquot say up to implementer
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
UsertypeslsquoCrsquo
host ACKsreceipt of echoedlsquoCrsquo
timesimple telnet scenario
host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTT
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTTbull SampleRTT measured time
from segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
bull Exponential weighted moving averagebull influence of past sample decreases exponentially
fastbull typical value = 18th (or 0125)
Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety margin
bull First estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP reliable data transfer
bull TCP creates rdt service on top of IPrsquos unreliable service
bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs
bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control
TCP Sender EventsData rcvd from appbull Create segment with
seq bull seq is byte-stream
number of first data byte in segment
bull Start timer if not already running (think of timer as for oldest unACKed segment)
bull Expiration interval TimeOutInterval
Timeoutbull retransmit segment
that caused timeoutbull restart timer ACK rcvdbull If acknowledges
previously unACKed segmentsndash update what is known
to be ACKedndash start timer if there are
outstanding segments
TCP Sender
(simplified)
NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71
y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed
TCP Retransmission Scenarios
Host A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq
=92
tim
eout
ACK=120
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq
=92
tim
eout
SendBase= 100
SendBase= 120
SendBase= 120
Sendbase= 100
TCP Retransmission Scenarios (more)
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
Cumulative ACK scenario
Host B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
Fast Retransmit
bull Time-out period often relatively longndash Long delay before
resending lost packet
bull Detect lost segments via duplicate ACKsndash Sender often sends
many segments back-to-back
ndash If segment lost there will likely be many duplicate ACKs for that segment
bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
Host A
timeo
ut
Host B
time
X
resend seq X2
seq x1seq x2seq x3seq x4seq x5
ACK x1
ACK x1ACK x1ACK x1
tripleduplicate
ACKs
Fast
Retra
nsm
it
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Flow Control
bull Receive side of TCP connection has a receive buffer
bull speed-matching service matching send rate to receiving applicationrsquos drain rate
bull App process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast
flow control
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
TCP Flow Control How it Works
(suppose TCP receiver discards out-of-order segments)
bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Receiver advertises unused buffer space by including rwnd value in segment header
bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos
buffer doesnrsquot overflow
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
rwndRcvBuffer
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control
info (eg RcvWindow)bull client connection
initiator Socket clientSocket = new
Socket(ldquohostnamerdquo port) bull server contacted by
client Socket connectionSocket =
welcomeSocketaccept()
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial
seq Step 3 client receives
SYNACK replies with ACK segment which may contain data
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer 3-81
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Principles of Congestion Control
Congestionbull Informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull Different from flow controlbull Manifestations
ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)
bull A ldquotop-10rdquo problem
Causescosts of Congestion Scenario 1
bull Two senders two receivers
bull One router infinite buffers
bull No retransmission
bull Large delays when congested
bull Maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
Causescosts of Congestion Scenario 2
bull One router finite buffers bull Sender retransmission of lost packet
finite shared output link
buffers
Host A
lin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of congestion Scenario 2
bull Always (goodput)
bull ldquoPerfectrdquo retransmission only when loss
bull Retransmission of delayed (not lost) packet makes
larger (than perfect case) for same
lin
lout
=
lin
lout
gtl
inl
out
ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of
pkt
R2
R2lin
l ou
t
b
R2
R2lin
l ou
t
a
R2
R2lin
l ou
t
c
R4
R3
Causescosts of Congestion Scenario 3
bull Four sendersbull Multihop pathsbull Timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of Congestion Scenario 3
Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Host A
Host B
lo
u
t
Approaches towards congestion control
End-end congestion control
bull No explicit feedback from network
bull Congestion inferred from end-system observed loss delay
bull Approach taken by TCP
Network-assisted congestion control
bull Routers provide feedback to end systemsndash Single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash Explicit rate sender should send at
Broadly
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Congestion Control
bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level
bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)
network not congested so increase sending rate
- lost segment assume loss due to congested network so decrease sending rate
TCP Congestion Control Bandwidth Probing
bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since
available bandwidth is changing depending on other connections in network)
ACKs being received so increase rate
X
X
XX
X loss so decrease rate
send
ing
rate
time
bull Q how fast to increasedecrease- details to follow
TCPrsquosldquosawtoothrdquobehavior
TCP Congestion Control detailsbull sender limits rate by limiting
number of unACKed bytes ldquoin pipelinerdquo
ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)
bull roughly
bull cwnd is dynamic function of perceived network congestion
rate = cwnd
RTT bytessec
LastByteSent-LastByteAcked cwnd
cwndbytes
RTT
ACK(s)
TCP Congestion Control more details
segment loss event reducing cwnd
bull timeout no response from receiverndash cut cwnd to 1
bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less
aggressively than on timeout
ACK received increase cwnd
bull slowstart phase - increase exponentially
fast (despite name) at connection start or following timeout
bull congestion avoidance - increase linearly
TCP Slow Start
bull when connection begins cwnd = 1 MSSndash example MSS = 500
bytes amp RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp
up to respectable ratebull increase rate exponentially
until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transitioning intoout of slowstart
ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2
ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to
congestion avoidance phase
slow start timeout
ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed
new ACKdupACKcount++
duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion
avoidance
TCP Congestion Avoidance
bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1
MSS per RTT ndash approach possible
congestion slower than in slowstart
ndash implementation cwnd = cwnd + MSScwnd for each ACK received
bull ACKs increase cwnd by 1 MSS per RTT additive increase
bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease
AIMD
AIMD Additive IncreaseMultiplicative Decrease
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
Rdt30 Channels with Errors and Loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of
help but not enough
Approach sender waits ldquoreasonablerdquo amount of time for ACK
bull Retransmits if no ACK received in this time
bull If pkt (or ACK) just delayed (not lost)ndash Retransmission will be
duplicate but use of seq rsquos already handles this
ndash Receiver must specify seq of pkt being ACKed
bull Requires countdown timer
Rdt30 Sender
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer
rdt_send(data)
Wait for
ACK0
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )
Wait for call 1 from
above
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer
rdt_send(data)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)
stop_timerstop_timer
udt_send(sndpkt)start_timer
timeout
udt_send(sndpkt)start_timer
timeout
rdt_rcv(rcvpkt)
Wait for call 0from
above
Wait for
ACK1
Lrdt_rcv(rcvpkt)
LL
L
Rdt30 in Action
Rdt30 in Action
Performance of Rdt30
bull Rdt30 works but performance stinkshellipbull ex 1 Gbps link 15 ms prop delay 8000 bit
packet
U sender utilization ndash fraction of time sender busy sending
U sender
= 008
30008 = 000027
microseconds
L R
RTT + L R =
bull 1KB pkt every 30 msec -gt 33kBsec throughput over 1 Gbps link
bull Network protocol limits use of physical resources
dsmicrosecon8bps10
bits80009
R
Ldtrans
(Picture next slide)
Rdt30 Stop-and-Wait Operation
first packet bit transmitted t = 0
sender
receiver
RTT
last packet bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
U sender
= 008
30008 = 000027
microseconds
L R
RTT + L R =
Pipelined Protocols
Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash Range of sequence numbers must be increasedndash Need buffering at sender andor receiver
Pipelining Increased Utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
U sender
= 024
30008 = 00008
microseconds
3 L R
RTT + L R =
Increase utilizationby a factor of 3
bull Two generic forms of pipelined protocols go-Back-N selective repeat
Pipelining Protocols
Go-back-N overviewbull sender up to N
unACKed pkts in pipeline
bull receiver only sends cumulative ACKsndash doesnrsquot ACK pkt if
therersquos a gap
bull sender has timer for oldest unACKed pktndash if timer expires
retransmit all unACKed packets
Selective Repeat overviewbull sender up to N
unACKed packets in pipeline
bull receiver ACKs individual pkts
bull sender maintains timer for each unACKed pktndash if timer expires
retransmit only unACKed packet
Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unACKed pkts allowed
bull ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)
bull Timer for each in-flight pktbull Timeout(n) retransmit pkt n and all higher seq pkts in
window
GBN Sender Extended FSM
Waitstart_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])
timeout
rdt_send(data)
if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)
base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
base=1nextseqnum=1
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
L
GBN Receiver Extended FSM
ACK-only always send ACK for correctly-received pkt with highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum
bull out-of-order pkt ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK pkt with highest in-order seq
Wait
udt_send(sndpkt)
default
rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++
expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)
L
GBN inaction
GBN Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsgo-back-nindexhtml
Selective Repeatbull Receiver individually acknowledges all
correctly received pktsndash Buffers pkts as needed for eventual in-order
delivery to upper layer
bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt
bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts
Selective Repeat sender receiver windows
Selective Repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next
unACKed seq
senderpkt n in [rcvbase rcvbase+N-
1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also
deliver buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1]
bull ACK(n)
otherwise bull ignore
receiver
Transport Layer 3-52
Selective Repeat in Action
Selective Repeat
DilemmaExample bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
SR Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Overview RFCs 793 1122 1323 2018 2581
bull full duplex datandash bi-directional data flow
in same connectionndash MSS maximum
segment size
bull connection-oriented ndash handshaking
(exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not
overwhelm receiver
bull point-to-pointndash one sender one
receiver
bull reliable in-order byte steamndash no ldquomessage
boundariesrdquo
bull pipelinedndash TCP congestion and
flow control set window size
bull send amp receive buffers
socketdoor
T C Psend buffer
T C Preceive buffer
socketdoor
segm ent
applicationwrites data
applicationreads data
TCP Segment Structure
source port dest port
32 bits
applicationdata (variable length)
sequence number
acknowledgement number
Receive window
Urg data pointerchecksumFSRPAU
headlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
TCP Seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next
byte expected from other side
ndash cumulative ACKQ how receiver
handles out-of-order segmentsndash A TCP spec
doesnrsquot say up to implementer
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
UsertypeslsquoCrsquo
host ACKsreceipt of echoedlsquoCrsquo
timesimple telnet scenario
host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTT
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTTbull SampleRTT measured time
from segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
bull Exponential weighted moving averagebull influence of past sample decreases exponentially
fastbull typical value = 18th (or 0125)
Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety margin
bull First estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP reliable data transfer
bull TCP creates rdt service on top of IPrsquos unreliable service
bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs
bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control
TCP Sender EventsData rcvd from appbull Create segment with
seq bull seq is byte-stream
number of first data byte in segment
bull Start timer if not already running (think of timer as for oldest unACKed segment)
bull Expiration interval TimeOutInterval
Timeoutbull retransmit segment
that caused timeoutbull restart timer ACK rcvdbull If acknowledges
previously unACKed segmentsndash update what is known
to be ACKedndash start timer if there are
outstanding segments
TCP Sender
(simplified)
NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71
y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed
TCP Retransmission Scenarios
Host A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq
=92
tim
eout
ACK=120
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq
=92
tim
eout
SendBase= 100
SendBase= 120
SendBase= 120
Sendbase= 100
TCP Retransmission Scenarios (more)
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
Cumulative ACK scenario
Host B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
Fast Retransmit
bull Time-out period often relatively longndash Long delay before
resending lost packet
bull Detect lost segments via duplicate ACKsndash Sender often sends
many segments back-to-back
ndash If segment lost there will likely be many duplicate ACKs for that segment
bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
Host A
timeo
ut
Host B
time
X
resend seq X2
seq x1seq x2seq x3seq x4seq x5
ACK x1
ACK x1ACK x1ACK x1
tripleduplicate
ACKs
Fast
Retra
nsm
it
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Flow Control
bull Receive side of TCP connection has a receive buffer
bull speed-matching service matching send rate to receiving applicationrsquos drain rate
bull App process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast
flow control
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
TCP Flow Control How it Works
(suppose TCP receiver discards out-of-order segments)
bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Receiver advertises unused buffer space by including rwnd value in segment header
bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos
buffer doesnrsquot overflow
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
rwndRcvBuffer
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control
info (eg RcvWindow)bull client connection
initiator Socket clientSocket = new
Socket(ldquohostnamerdquo port) bull server contacted by
client Socket connectionSocket =
welcomeSocketaccept()
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial
seq Step 3 client receives
SYNACK replies with ACK segment which may contain data
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer 3-81
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Principles of Congestion Control
Congestionbull Informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull Different from flow controlbull Manifestations
ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)
bull A ldquotop-10rdquo problem
Causescosts of Congestion Scenario 1
bull Two senders two receivers
bull One router infinite buffers
bull No retransmission
bull Large delays when congested
bull Maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
Causescosts of Congestion Scenario 2
bull One router finite buffers bull Sender retransmission of lost packet
finite shared output link
buffers
Host A
lin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of congestion Scenario 2
bull Always (goodput)
bull ldquoPerfectrdquo retransmission only when loss
bull Retransmission of delayed (not lost) packet makes
larger (than perfect case) for same
lin
lout
=
lin
lout
gtl
inl
out
ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of
pkt
R2
R2lin
l ou
t
b
R2
R2lin
l ou
t
a
R2
R2lin
l ou
t
c
R4
R3
Causescosts of Congestion Scenario 3
bull Four sendersbull Multihop pathsbull Timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of Congestion Scenario 3
Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Host A
Host B
lo
u
t
Approaches towards congestion control
End-end congestion control
bull No explicit feedback from network
bull Congestion inferred from end-system observed loss delay
bull Approach taken by TCP
Network-assisted congestion control
bull Routers provide feedback to end systemsndash Single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash Explicit rate sender should send at
Broadly
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Congestion Control
bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level
bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)
network not congested so increase sending rate
- lost segment assume loss due to congested network so decrease sending rate
TCP Congestion Control Bandwidth Probing
bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since
available bandwidth is changing depending on other connections in network)
ACKs being received so increase rate
X
X
XX
X loss so decrease rate
send
ing
rate
time
bull Q how fast to increasedecrease- details to follow
TCPrsquosldquosawtoothrdquobehavior
TCP Congestion Control detailsbull sender limits rate by limiting
number of unACKed bytes ldquoin pipelinerdquo
ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)
bull roughly
bull cwnd is dynamic function of perceived network congestion
rate = cwnd
RTT bytessec
LastByteSent-LastByteAcked cwnd
cwndbytes
RTT
ACK(s)
TCP Congestion Control more details
segment loss event reducing cwnd
bull timeout no response from receiverndash cut cwnd to 1
bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less
aggressively than on timeout
ACK received increase cwnd
bull slowstart phase - increase exponentially
fast (despite name) at connection start or following timeout
bull congestion avoidance - increase linearly
TCP Slow Start
bull when connection begins cwnd = 1 MSSndash example MSS = 500
bytes amp RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp
up to respectable ratebull increase rate exponentially
until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transitioning intoout of slowstart
ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2
ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to
congestion avoidance phase
slow start timeout
ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed
new ACKdupACKcount++
duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion
avoidance
TCP Congestion Avoidance
bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1
MSS per RTT ndash approach possible
congestion slower than in slowstart
ndash implementation cwnd = cwnd + MSScwnd for each ACK received
bull ACKs increase cwnd by 1 MSS per RTT additive increase
bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease
AIMD
AIMD Additive IncreaseMultiplicative Decrease
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
Rdt30 Sender
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer
rdt_send(data)
Wait for
ACK0
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )
Wait for call 1 from
above
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer
rdt_send(data)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)
stop_timerstop_timer
udt_send(sndpkt)start_timer
timeout
udt_send(sndpkt)start_timer
timeout
rdt_rcv(rcvpkt)
Wait for call 0from
above
Wait for
ACK1
Lrdt_rcv(rcvpkt)
LL
L
Rdt30 in Action
Rdt30 in Action
Performance of Rdt30
bull Rdt30 works but performance stinkshellipbull ex 1 Gbps link 15 ms prop delay 8000 bit
packet
U sender utilization ndash fraction of time sender busy sending
U sender
= 008
30008 = 000027
microseconds
L R
RTT + L R =
bull 1KB pkt every 30 msec -gt 33kBsec throughput over 1 Gbps link
bull Network protocol limits use of physical resources
dsmicrosecon8bps10
bits80009
R
Ldtrans
(Picture next slide)
Rdt30 Stop-and-Wait Operation
first packet bit transmitted t = 0
sender
receiver
RTT
last packet bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
U sender
= 008
30008 = 000027
microseconds
L R
RTT + L R =
Pipelined Protocols
Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash Range of sequence numbers must be increasedndash Need buffering at sender andor receiver
Pipelining Increased Utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
U sender
= 024
30008 = 00008
microseconds
3 L R
RTT + L R =
Increase utilizationby a factor of 3
bull Two generic forms of pipelined protocols go-Back-N selective repeat
Pipelining Protocols
Go-back-N overviewbull sender up to N
unACKed pkts in pipeline
bull receiver only sends cumulative ACKsndash doesnrsquot ACK pkt if
therersquos a gap
bull sender has timer for oldest unACKed pktndash if timer expires
retransmit all unACKed packets
Selective Repeat overviewbull sender up to N
unACKed packets in pipeline
bull receiver ACKs individual pkts
bull sender maintains timer for each unACKed pktndash if timer expires
retransmit only unACKed packet
Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unACKed pkts allowed
bull ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)
bull Timer for each in-flight pktbull Timeout(n) retransmit pkt n and all higher seq pkts in
window
GBN Sender Extended FSM
Waitstart_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])
timeout
rdt_send(data)
if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)
base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
base=1nextseqnum=1
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
L
GBN Receiver Extended FSM
ACK-only always send ACK for correctly-received pkt with highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum
bull out-of-order pkt ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK pkt with highest in-order seq
Wait
udt_send(sndpkt)
default
rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++
expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)
L
GBN inaction
GBN Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsgo-back-nindexhtml
Selective Repeatbull Receiver individually acknowledges all
correctly received pktsndash Buffers pkts as needed for eventual in-order
delivery to upper layer
bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt
bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts
Selective Repeat sender receiver windows
Selective Repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next
unACKed seq
senderpkt n in [rcvbase rcvbase+N-
1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also
deliver buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1]
bull ACK(n)
otherwise bull ignore
receiver
Transport Layer 3-52
Selective Repeat in Action
Selective Repeat
DilemmaExample bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
SR Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Overview RFCs 793 1122 1323 2018 2581
bull full duplex datandash bi-directional data flow
in same connectionndash MSS maximum
segment size
bull connection-oriented ndash handshaking
(exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not
overwhelm receiver
bull point-to-pointndash one sender one
receiver
bull reliable in-order byte steamndash no ldquomessage
boundariesrdquo
bull pipelinedndash TCP congestion and
flow control set window size
bull send amp receive buffers
socketdoor
T C Psend buffer
T C Preceive buffer
socketdoor
segm ent
applicationwrites data
applicationreads data
TCP Segment Structure
source port dest port
32 bits
applicationdata (variable length)
sequence number
acknowledgement number
Receive window
Urg data pointerchecksumFSRPAU
headlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
TCP Seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next
byte expected from other side
ndash cumulative ACKQ how receiver
handles out-of-order segmentsndash A TCP spec
doesnrsquot say up to implementer
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
UsertypeslsquoCrsquo
host ACKsreceipt of echoedlsquoCrsquo
timesimple telnet scenario
host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTT
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTTbull SampleRTT measured time
from segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
bull Exponential weighted moving averagebull influence of past sample decreases exponentially
fastbull typical value = 18th (or 0125)
Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety margin
bull First estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP reliable data transfer
bull TCP creates rdt service on top of IPrsquos unreliable service
bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs
bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control
TCP Sender EventsData rcvd from appbull Create segment with
seq bull seq is byte-stream
number of first data byte in segment
bull Start timer if not already running (think of timer as for oldest unACKed segment)
bull Expiration interval TimeOutInterval
Timeoutbull retransmit segment
that caused timeoutbull restart timer ACK rcvdbull If acknowledges
previously unACKed segmentsndash update what is known
to be ACKedndash start timer if there are
outstanding segments
TCP Sender
(simplified)
NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71
y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed
TCP Retransmission Scenarios
Host A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq
=92
tim
eout
ACK=120
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq
=92
tim
eout
SendBase= 100
SendBase= 120
SendBase= 120
Sendbase= 100
TCP Retransmission Scenarios (more)
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
Cumulative ACK scenario
Host B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
Fast Retransmit
bull Time-out period often relatively longndash Long delay before
resending lost packet
bull Detect lost segments via duplicate ACKsndash Sender often sends
many segments back-to-back
ndash If segment lost there will likely be many duplicate ACKs for that segment
bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
Host A
timeo
ut
Host B
time
X
resend seq X2
seq x1seq x2seq x3seq x4seq x5
ACK x1
ACK x1ACK x1ACK x1
tripleduplicate
ACKs
Fast
Retra
nsm
it
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Flow Control
bull Receive side of TCP connection has a receive buffer
bull speed-matching service matching send rate to receiving applicationrsquos drain rate
bull App process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast
flow control
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
TCP Flow Control How it Works
(suppose TCP receiver discards out-of-order segments)
bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Receiver advertises unused buffer space by including rwnd value in segment header
bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos
buffer doesnrsquot overflow
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
rwndRcvBuffer
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control
info (eg RcvWindow)bull client connection
initiator Socket clientSocket = new
Socket(ldquohostnamerdquo port) bull server contacted by
client Socket connectionSocket =
welcomeSocketaccept()
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial
seq Step 3 client receives
SYNACK replies with ACK segment which may contain data
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer 3-81
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Principles of Congestion Control
Congestionbull Informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull Different from flow controlbull Manifestations
ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)
bull A ldquotop-10rdquo problem
Causescosts of Congestion Scenario 1
bull Two senders two receivers
bull One router infinite buffers
bull No retransmission
bull Large delays when congested
bull Maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
Causescosts of Congestion Scenario 2
bull One router finite buffers bull Sender retransmission of lost packet
finite shared output link
buffers
Host A
lin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of congestion Scenario 2
bull Always (goodput)
bull ldquoPerfectrdquo retransmission only when loss
bull Retransmission of delayed (not lost) packet makes
larger (than perfect case) for same
lin
lout
=
lin
lout
gtl
inl
out
ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of
pkt
R2
R2lin
l ou
t
b
R2
R2lin
l ou
t
a
R2
R2lin
l ou
t
c
R4
R3
Causescosts of Congestion Scenario 3
bull Four sendersbull Multihop pathsbull Timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of Congestion Scenario 3
Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Host A
Host B
lo
u
t
Approaches towards congestion control
End-end congestion control
bull No explicit feedback from network
bull Congestion inferred from end-system observed loss delay
bull Approach taken by TCP
Network-assisted congestion control
bull Routers provide feedback to end systemsndash Single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash Explicit rate sender should send at
Broadly
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Congestion Control
bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level
bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)
network not congested so increase sending rate
- lost segment assume loss due to congested network so decrease sending rate
TCP Congestion Control Bandwidth Probing
bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since
available bandwidth is changing depending on other connections in network)
ACKs being received so increase rate
X
X
XX
X loss so decrease rate
send
ing
rate
time
bull Q how fast to increasedecrease- details to follow
TCPrsquosldquosawtoothrdquobehavior
TCP Congestion Control detailsbull sender limits rate by limiting
number of unACKed bytes ldquoin pipelinerdquo
ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)
bull roughly
bull cwnd is dynamic function of perceived network congestion
rate = cwnd
RTT bytessec
LastByteSent-LastByteAcked cwnd
cwndbytes
RTT
ACK(s)
TCP Congestion Control more details
segment loss event reducing cwnd
bull timeout no response from receiverndash cut cwnd to 1
bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less
aggressively than on timeout
ACK received increase cwnd
bull slowstart phase - increase exponentially
fast (despite name) at connection start or following timeout
bull congestion avoidance - increase linearly
TCP Slow Start
bull when connection begins cwnd = 1 MSSndash example MSS = 500
bytes amp RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp
up to respectable ratebull increase rate exponentially
until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transitioning intoout of slowstart
ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2
ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to
congestion avoidance phase
slow start timeout
ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed
new ACKdupACKcount++
duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion
avoidance
TCP Congestion Avoidance
bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1
MSS per RTT ndash approach possible
congestion slower than in slowstart
ndash implementation cwnd = cwnd + MSScwnd for each ACK received
bull ACKs increase cwnd by 1 MSS per RTT additive increase
bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease
AIMD
AIMD Additive IncreaseMultiplicative Decrease
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
Rdt30 in Action
Rdt30 in Action
Performance of Rdt30
bull Rdt30 works but performance stinkshellipbull ex 1 Gbps link 15 ms prop delay 8000 bit
packet
U sender utilization ndash fraction of time sender busy sending
U sender
= 008
30008 = 000027
microseconds
L R
RTT + L R =
bull 1KB pkt every 30 msec -gt 33kBsec throughput over 1 Gbps link
bull Network protocol limits use of physical resources
dsmicrosecon8bps10
bits80009
R
Ldtrans
(Picture next slide)
Rdt30 Stop-and-Wait Operation
first packet bit transmitted t = 0
sender
receiver
RTT
last packet bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
U sender
= 008
30008 = 000027
microseconds
L R
RTT + L R =
Pipelined Protocols
Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash Range of sequence numbers must be increasedndash Need buffering at sender andor receiver
Pipelining Increased Utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
U sender
= 024
30008 = 00008
microseconds
3 L R
RTT + L R =
Increase utilizationby a factor of 3
bull Two generic forms of pipelined protocols go-Back-N selective repeat
Pipelining Protocols
Go-back-N overviewbull sender up to N
unACKed pkts in pipeline
bull receiver only sends cumulative ACKsndash doesnrsquot ACK pkt if
therersquos a gap
bull sender has timer for oldest unACKed pktndash if timer expires
retransmit all unACKed packets
Selective Repeat overviewbull sender up to N
unACKed packets in pipeline
bull receiver ACKs individual pkts
bull sender maintains timer for each unACKed pktndash if timer expires
retransmit only unACKed packet
Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unACKed pkts allowed
bull ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)
bull Timer for each in-flight pktbull Timeout(n) retransmit pkt n and all higher seq pkts in
window
GBN Sender Extended FSM
Waitstart_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])
timeout
rdt_send(data)
if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)
base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
base=1nextseqnum=1
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
L
GBN Receiver Extended FSM
ACK-only always send ACK for correctly-received pkt with highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum
bull out-of-order pkt ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK pkt with highest in-order seq
Wait
udt_send(sndpkt)
default
rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++
expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)
L
GBN inaction
GBN Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsgo-back-nindexhtml
Selective Repeatbull Receiver individually acknowledges all
correctly received pktsndash Buffers pkts as needed for eventual in-order
delivery to upper layer
bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt
bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts
Selective Repeat sender receiver windows
Selective Repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next
unACKed seq
senderpkt n in [rcvbase rcvbase+N-
1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also
deliver buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1]
bull ACK(n)
otherwise bull ignore
receiver
Transport Layer 3-52
Selective Repeat in Action
Selective Repeat
DilemmaExample bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
SR Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Overview RFCs 793 1122 1323 2018 2581
bull full duplex datandash bi-directional data flow
in same connectionndash MSS maximum
segment size
bull connection-oriented ndash handshaking
(exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not
overwhelm receiver
bull point-to-pointndash one sender one
receiver
bull reliable in-order byte steamndash no ldquomessage
boundariesrdquo
bull pipelinedndash TCP congestion and
flow control set window size
bull send amp receive buffers
socketdoor
T C Psend buffer
T C Preceive buffer
socketdoor
segm ent
applicationwrites data
applicationreads data
TCP Segment Structure
source port dest port
32 bits
applicationdata (variable length)
sequence number
acknowledgement number
Receive window
Urg data pointerchecksumFSRPAU
headlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
TCP Seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next
byte expected from other side
ndash cumulative ACKQ how receiver
handles out-of-order segmentsndash A TCP spec
doesnrsquot say up to implementer
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
UsertypeslsquoCrsquo
host ACKsreceipt of echoedlsquoCrsquo
timesimple telnet scenario
host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTT
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTTbull SampleRTT measured time
from segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
bull Exponential weighted moving averagebull influence of past sample decreases exponentially
fastbull typical value = 18th (or 0125)
Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety margin
bull First estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP reliable data transfer
bull TCP creates rdt service on top of IPrsquos unreliable service
bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs
bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control
TCP Sender EventsData rcvd from appbull Create segment with
seq bull seq is byte-stream
number of first data byte in segment
bull Start timer if not already running (think of timer as for oldest unACKed segment)
bull Expiration interval TimeOutInterval
Timeoutbull retransmit segment
that caused timeoutbull restart timer ACK rcvdbull If acknowledges
previously unACKed segmentsndash update what is known
to be ACKedndash start timer if there are
outstanding segments
TCP Sender
(simplified)
NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71
y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed
TCP Retransmission Scenarios
Host A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq
=92
tim
eout
ACK=120
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq
=92
tim
eout
SendBase= 100
SendBase= 120
SendBase= 120
Sendbase= 100
TCP Retransmission Scenarios (more)
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
Cumulative ACK scenario
Host B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
Fast Retransmit
bull Time-out period often relatively longndash Long delay before
resending lost packet
bull Detect lost segments via duplicate ACKsndash Sender often sends
many segments back-to-back
ndash If segment lost there will likely be many duplicate ACKs for that segment
bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
Host A
timeo
ut
Host B
time
X
resend seq X2
seq x1seq x2seq x3seq x4seq x5
ACK x1
ACK x1ACK x1ACK x1
tripleduplicate
ACKs
Fast
Retra
nsm
it
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Flow Control
bull Receive side of TCP connection has a receive buffer
bull speed-matching service matching send rate to receiving applicationrsquos drain rate
bull App process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast
flow control
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
TCP Flow Control How it Works
(suppose TCP receiver discards out-of-order segments)
bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Receiver advertises unused buffer space by including rwnd value in segment header
bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos
buffer doesnrsquot overflow
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
rwndRcvBuffer
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control
info (eg RcvWindow)bull client connection
initiator Socket clientSocket = new
Socket(ldquohostnamerdquo port) bull server contacted by
client Socket connectionSocket =
welcomeSocketaccept()
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial
seq Step 3 client receives
SYNACK replies with ACK segment which may contain data
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer 3-81
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Principles of Congestion Control
Congestionbull Informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull Different from flow controlbull Manifestations
ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)
bull A ldquotop-10rdquo problem
Causescosts of Congestion Scenario 1
bull Two senders two receivers
bull One router infinite buffers
bull No retransmission
bull Large delays when congested
bull Maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
Causescosts of Congestion Scenario 2
bull One router finite buffers bull Sender retransmission of lost packet
finite shared output link
buffers
Host A
lin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of congestion Scenario 2
bull Always (goodput)
bull ldquoPerfectrdquo retransmission only when loss
bull Retransmission of delayed (not lost) packet makes
larger (than perfect case) for same
lin
lout
=
lin
lout
gtl
inl
out
ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of
pkt
R2
R2lin
l ou
t
b
R2
R2lin
l ou
t
a
R2
R2lin
l ou
t
c
R4
R3
Causescosts of Congestion Scenario 3
bull Four sendersbull Multihop pathsbull Timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of Congestion Scenario 3
Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Host A
Host B
lo
u
t
Approaches towards congestion control
End-end congestion control
bull No explicit feedback from network
bull Congestion inferred from end-system observed loss delay
bull Approach taken by TCP
Network-assisted congestion control
bull Routers provide feedback to end systemsndash Single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash Explicit rate sender should send at
Broadly
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Congestion Control
bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level
bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)
network not congested so increase sending rate
- lost segment assume loss due to congested network so decrease sending rate
TCP Congestion Control Bandwidth Probing
bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since
available bandwidth is changing depending on other connections in network)
ACKs being received so increase rate
X
X
XX
X loss so decrease rate
send
ing
rate
time
bull Q how fast to increasedecrease- details to follow
TCPrsquosldquosawtoothrdquobehavior
TCP Congestion Control detailsbull sender limits rate by limiting
number of unACKed bytes ldquoin pipelinerdquo
ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)
bull roughly
bull cwnd is dynamic function of perceived network congestion
rate = cwnd
RTT bytessec
LastByteSent-LastByteAcked cwnd
cwndbytes
RTT
ACK(s)
TCP Congestion Control more details
segment loss event reducing cwnd
bull timeout no response from receiverndash cut cwnd to 1
bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less
aggressively than on timeout
ACK received increase cwnd
bull slowstart phase - increase exponentially
fast (despite name) at connection start or following timeout
bull congestion avoidance - increase linearly
TCP Slow Start
bull when connection begins cwnd = 1 MSSndash example MSS = 500
bytes amp RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp
up to respectable ratebull increase rate exponentially
until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transitioning intoout of slowstart
ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2
ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to
congestion avoidance phase
slow start timeout
ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed
new ACKdupACKcount++
duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion
avoidance
TCP Congestion Avoidance
bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1
MSS per RTT ndash approach possible
congestion slower than in slowstart
ndash implementation cwnd = cwnd + MSScwnd for each ACK received
bull ACKs increase cwnd by 1 MSS per RTT additive increase
bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease
AIMD
AIMD Additive IncreaseMultiplicative Decrease
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
Rdt30 in Action
Performance of Rdt30
bull Rdt30 works but performance stinkshellipbull ex 1 Gbps link 15 ms prop delay 8000 bit
packet
U sender utilization ndash fraction of time sender busy sending
U sender
= 008
30008 = 000027
microseconds
L R
RTT + L R =
bull 1KB pkt every 30 msec -gt 33kBsec throughput over 1 Gbps link
bull Network protocol limits use of physical resources
dsmicrosecon8bps10
bits80009
R
Ldtrans
(Picture next slide)
Rdt30 Stop-and-Wait Operation
first packet bit transmitted t = 0
sender
receiver
RTT
last packet bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
U sender
= 008
30008 = 000027
microseconds
L R
RTT + L R =
Pipelined Protocols
Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash Range of sequence numbers must be increasedndash Need buffering at sender andor receiver
Pipelining Increased Utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
U sender
= 024
30008 = 00008
microseconds
3 L R
RTT + L R =
Increase utilizationby a factor of 3
bull Two generic forms of pipelined protocols go-Back-N selective repeat
Pipelining Protocols
Go-back-N overviewbull sender up to N
unACKed pkts in pipeline
bull receiver only sends cumulative ACKsndash doesnrsquot ACK pkt if
therersquos a gap
bull sender has timer for oldest unACKed pktndash if timer expires
retransmit all unACKed packets
Selective Repeat overviewbull sender up to N
unACKed packets in pipeline
bull receiver ACKs individual pkts
bull sender maintains timer for each unACKed pktndash if timer expires
retransmit only unACKed packet
Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unACKed pkts allowed
bull ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)
bull Timer for each in-flight pktbull Timeout(n) retransmit pkt n and all higher seq pkts in
window
GBN Sender Extended FSM
Waitstart_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])
timeout
rdt_send(data)
if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)
base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
base=1nextseqnum=1
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
L
GBN Receiver Extended FSM
ACK-only always send ACK for correctly-received pkt with highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum
bull out-of-order pkt ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK pkt with highest in-order seq
Wait
udt_send(sndpkt)
default
rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++
expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)
L
GBN inaction
GBN Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsgo-back-nindexhtml
Selective Repeatbull Receiver individually acknowledges all
correctly received pktsndash Buffers pkts as needed for eventual in-order
delivery to upper layer
bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt
bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts
Selective Repeat sender receiver windows
Selective Repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next
unACKed seq
senderpkt n in [rcvbase rcvbase+N-
1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also
deliver buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1]
bull ACK(n)
otherwise bull ignore
receiver
Transport Layer 3-52
Selective Repeat in Action
Selective Repeat
DilemmaExample bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
SR Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Overview RFCs 793 1122 1323 2018 2581
bull full duplex datandash bi-directional data flow
in same connectionndash MSS maximum
segment size
bull connection-oriented ndash handshaking
(exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not
overwhelm receiver
bull point-to-pointndash one sender one
receiver
bull reliable in-order byte steamndash no ldquomessage
boundariesrdquo
bull pipelinedndash TCP congestion and
flow control set window size
bull send amp receive buffers
socketdoor
T C Psend buffer
T C Preceive buffer
socketdoor
segm ent
applicationwrites data
applicationreads data
TCP Segment Structure
source port dest port
32 bits
applicationdata (variable length)
sequence number
acknowledgement number
Receive window
Urg data pointerchecksumFSRPAU
headlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
TCP Seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next
byte expected from other side
ndash cumulative ACKQ how receiver
handles out-of-order segmentsndash A TCP spec
doesnrsquot say up to implementer
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
UsertypeslsquoCrsquo
host ACKsreceipt of echoedlsquoCrsquo
timesimple telnet scenario
host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTT
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTTbull SampleRTT measured time
from segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
bull Exponential weighted moving averagebull influence of past sample decreases exponentially
fastbull typical value = 18th (or 0125)
Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety margin
bull First estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP reliable data transfer
bull TCP creates rdt service on top of IPrsquos unreliable service
bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs
bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control
TCP Sender EventsData rcvd from appbull Create segment with
seq bull seq is byte-stream
number of first data byte in segment
bull Start timer if not already running (think of timer as for oldest unACKed segment)
bull Expiration interval TimeOutInterval
Timeoutbull retransmit segment
that caused timeoutbull restart timer ACK rcvdbull If acknowledges
previously unACKed segmentsndash update what is known
to be ACKedndash start timer if there are
outstanding segments
TCP Sender
(simplified)
NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71
y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed
TCP Retransmission Scenarios
Host A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq
=92
tim
eout
ACK=120
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq
=92
tim
eout
SendBase= 100
SendBase= 120
SendBase= 120
Sendbase= 100
TCP Retransmission Scenarios (more)
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
Cumulative ACK scenario
Host B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
Fast Retransmit
bull Time-out period often relatively longndash Long delay before
resending lost packet
bull Detect lost segments via duplicate ACKsndash Sender often sends
many segments back-to-back
ndash If segment lost there will likely be many duplicate ACKs for that segment
bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
Host A
timeo
ut
Host B
time
X
resend seq X2
seq x1seq x2seq x3seq x4seq x5
ACK x1
ACK x1ACK x1ACK x1
tripleduplicate
ACKs
Fast
Retra
nsm
it
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Flow Control
bull Receive side of TCP connection has a receive buffer
bull speed-matching service matching send rate to receiving applicationrsquos drain rate
bull App process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast
flow control
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
TCP Flow Control How it Works
(suppose TCP receiver discards out-of-order segments)
bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Receiver advertises unused buffer space by including rwnd value in segment header
bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos
buffer doesnrsquot overflow
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
rwndRcvBuffer
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control
info (eg RcvWindow)bull client connection
initiator Socket clientSocket = new
Socket(ldquohostnamerdquo port) bull server contacted by
client Socket connectionSocket =
welcomeSocketaccept()
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial
seq Step 3 client receives
SYNACK replies with ACK segment which may contain data
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer 3-81
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Principles of Congestion Control
Congestionbull Informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull Different from flow controlbull Manifestations
ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)
bull A ldquotop-10rdquo problem
Causescosts of Congestion Scenario 1
bull Two senders two receivers
bull One router infinite buffers
bull No retransmission
bull Large delays when congested
bull Maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
Causescosts of Congestion Scenario 2
bull One router finite buffers bull Sender retransmission of lost packet
finite shared output link
buffers
Host A
lin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of congestion Scenario 2
bull Always (goodput)
bull ldquoPerfectrdquo retransmission only when loss
bull Retransmission of delayed (not lost) packet makes
larger (than perfect case) for same
lin
lout
=
lin
lout
gtl
inl
out
ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of
pkt
R2
R2lin
l ou
t
b
R2
R2lin
l ou
t
a
R2
R2lin
l ou
t
c
R4
R3
Causescosts of Congestion Scenario 3
bull Four sendersbull Multihop pathsbull Timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of Congestion Scenario 3
Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Host A
Host B
lo
u
t
Approaches towards congestion control
End-end congestion control
bull No explicit feedback from network
bull Congestion inferred from end-system observed loss delay
bull Approach taken by TCP
Network-assisted congestion control
bull Routers provide feedback to end systemsndash Single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash Explicit rate sender should send at
Broadly
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Congestion Control
bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level
bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)
network not congested so increase sending rate
- lost segment assume loss due to congested network so decrease sending rate
TCP Congestion Control Bandwidth Probing
bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since
available bandwidth is changing depending on other connections in network)
ACKs being received so increase rate
X
X
XX
X loss so decrease rate
send
ing
rate
time
bull Q how fast to increasedecrease- details to follow
TCPrsquosldquosawtoothrdquobehavior
TCP Congestion Control detailsbull sender limits rate by limiting
number of unACKed bytes ldquoin pipelinerdquo
ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)
bull roughly
bull cwnd is dynamic function of perceived network congestion
rate = cwnd
RTT bytessec
LastByteSent-LastByteAcked cwnd
cwndbytes
RTT
ACK(s)
TCP Congestion Control more details
segment loss event reducing cwnd
bull timeout no response from receiverndash cut cwnd to 1
bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less
aggressively than on timeout
ACK received increase cwnd
bull slowstart phase - increase exponentially
fast (despite name) at connection start or following timeout
bull congestion avoidance - increase linearly
TCP Slow Start
bull when connection begins cwnd = 1 MSSndash example MSS = 500
bytes amp RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp
up to respectable ratebull increase rate exponentially
until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transitioning intoout of slowstart
ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2
ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to
congestion avoidance phase
slow start timeout
ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed
new ACKdupACKcount++
duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion
avoidance
TCP Congestion Avoidance
bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1
MSS per RTT ndash approach possible
congestion slower than in slowstart
ndash implementation cwnd = cwnd + MSScwnd for each ACK received
bull ACKs increase cwnd by 1 MSS per RTT additive increase
bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease
AIMD
AIMD Additive IncreaseMultiplicative Decrease
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
Performance of Rdt30
bull Rdt30 works but performance stinkshellipbull ex 1 Gbps link 15 ms prop delay 8000 bit
packet
U sender utilization ndash fraction of time sender busy sending
U sender
= 008
30008 = 000027
microseconds
L R
RTT + L R =
bull 1KB pkt every 30 msec -gt 33kBsec throughput over 1 Gbps link
bull Network protocol limits use of physical resources
dsmicrosecon8bps10
bits80009
R
Ldtrans
(Picture next slide)
Rdt30 Stop-and-Wait Operation
first packet bit transmitted t = 0
sender
receiver
RTT
last packet bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
U sender
= 008
30008 = 000027
microseconds
L R
RTT + L R =
Pipelined Protocols
Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash Range of sequence numbers must be increasedndash Need buffering at sender andor receiver
Pipelining Increased Utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
U sender
= 024
30008 = 00008
microseconds
3 L R
RTT + L R =
Increase utilizationby a factor of 3
bull Two generic forms of pipelined protocols go-Back-N selective repeat
Pipelining Protocols
Go-back-N overviewbull sender up to N
unACKed pkts in pipeline
bull receiver only sends cumulative ACKsndash doesnrsquot ACK pkt if
therersquos a gap
bull sender has timer for oldest unACKed pktndash if timer expires
retransmit all unACKed packets
Selective Repeat overviewbull sender up to N
unACKed packets in pipeline
bull receiver ACKs individual pkts
bull sender maintains timer for each unACKed pktndash if timer expires
retransmit only unACKed packet
Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unACKed pkts allowed
bull ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)
bull Timer for each in-flight pktbull Timeout(n) retransmit pkt n and all higher seq pkts in
window
GBN Sender Extended FSM
Waitstart_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])
timeout
rdt_send(data)
if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)
base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
base=1nextseqnum=1
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
L
GBN Receiver Extended FSM
ACK-only always send ACK for correctly-received pkt with highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum
bull out-of-order pkt ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK pkt with highest in-order seq
Wait
udt_send(sndpkt)
default
rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++
expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)
L
GBN inaction
GBN Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsgo-back-nindexhtml
Selective Repeatbull Receiver individually acknowledges all
correctly received pktsndash Buffers pkts as needed for eventual in-order
delivery to upper layer
bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt
bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts
Selective Repeat sender receiver windows
Selective Repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next
unACKed seq
senderpkt n in [rcvbase rcvbase+N-
1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also
deliver buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1]
bull ACK(n)
otherwise bull ignore
receiver
Transport Layer 3-52
Selective Repeat in Action
Selective Repeat
DilemmaExample bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
SR Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Overview RFCs 793 1122 1323 2018 2581
bull full duplex datandash bi-directional data flow
in same connectionndash MSS maximum
segment size
bull connection-oriented ndash handshaking
(exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not
overwhelm receiver
bull point-to-pointndash one sender one
receiver
bull reliable in-order byte steamndash no ldquomessage
boundariesrdquo
bull pipelinedndash TCP congestion and
flow control set window size
bull send amp receive buffers
socketdoor
T C Psend buffer
T C Preceive buffer
socketdoor
segm ent
applicationwrites data
applicationreads data
TCP Segment Structure
source port dest port
32 bits
applicationdata (variable length)
sequence number
acknowledgement number
Receive window
Urg data pointerchecksumFSRPAU
headlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
TCP Seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next
byte expected from other side
ndash cumulative ACKQ how receiver
handles out-of-order segmentsndash A TCP spec
doesnrsquot say up to implementer
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
UsertypeslsquoCrsquo
host ACKsreceipt of echoedlsquoCrsquo
timesimple telnet scenario
host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTT
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTTbull SampleRTT measured time
from segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
bull Exponential weighted moving averagebull influence of past sample decreases exponentially
fastbull typical value = 18th (or 0125)
Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety margin
bull First estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP reliable data transfer
bull TCP creates rdt service on top of IPrsquos unreliable service
bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs
bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control
TCP Sender EventsData rcvd from appbull Create segment with
seq bull seq is byte-stream
number of first data byte in segment
bull Start timer if not already running (think of timer as for oldest unACKed segment)
bull Expiration interval TimeOutInterval
Timeoutbull retransmit segment
that caused timeoutbull restart timer ACK rcvdbull If acknowledges
previously unACKed segmentsndash update what is known
to be ACKedndash start timer if there are
outstanding segments
TCP Sender
(simplified)
NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71
y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed
TCP Retransmission Scenarios
Host A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq
=92
tim
eout
ACK=120
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq
=92
tim
eout
SendBase= 100
SendBase= 120
SendBase= 120
Sendbase= 100
TCP Retransmission Scenarios (more)
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
Cumulative ACK scenario
Host B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
Fast Retransmit
bull Time-out period often relatively longndash Long delay before
resending lost packet
bull Detect lost segments via duplicate ACKsndash Sender often sends
many segments back-to-back
ndash If segment lost there will likely be many duplicate ACKs for that segment
bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
Host A
timeo
ut
Host B
time
X
resend seq X2
seq x1seq x2seq x3seq x4seq x5
ACK x1
ACK x1ACK x1ACK x1
tripleduplicate
ACKs
Fast
Retra
nsm
it
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Flow Control
bull Receive side of TCP connection has a receive buffer
bull speed-matching service matching send rate to receiving applicationrsquos drain rate
bull App process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast
flow control
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
TCP Flow Control How it Works
(suppose TCP receiver discards out-of-order segments)
bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Receiver advertises unused buffer space by including rwnd value in segment header
bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos
buffer doesnrsquot overflow
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
rwndRcvBuffer
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control
info (eg RcvWindow)bull client connection
initiator Socket clientSocket = new
Socket(ldquohostnamerdquo port) bull server contacted by
client Socket connectionSocket =
welcomeSocketaccept()
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial
seq Step 3 client receives
SYNACK replies with ACK segment which may contain data
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer 3-81
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Principles of Congestion Control
Congestionbull Informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull Different from flow controlbull Manifestations
ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)
bull A ldquotop-10rdquo problem
Causescosts of Congestion Scenario 1
bull Two senders two receivers
bull One router infinite buffers
bull No retransmission
bull Large delays when congested
bull Maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
Causescosts of Congestion Scenario 2
bull One router finite buffers bull Sender retransmission of lost packet
finite shared output link
buffers
Host A
lin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of congestion Scenario 2
bull Always (goodput)
bull ldquoPerfectrdquo retransmission only when loss
bull Retransmission of delayed (not lost) packet makes
larger (than perfect case) for same
lin
lout
=
lin
lout
gtl
inl
out
ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of
pkt
R2
R2lin
l ou
t
b
R2
R2lin
l ou
t
a
R2
R2lin
l ou
t
c
R4
R3
Causescosts of Congestion Scenario 3
bull Four sendersbull Multihop pathsbull Timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of Congestion Scenario 3
Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Host A
Host B
lo
u
t
Approaches towards congestion control
End-end congestion control
bull No explicit feedback from network
bull Congestion inferred from end-system observed loss delay
bull Approach taken by TCP
Network-assisted congestion control
bull Routers provide feedback to end systemsndash Single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash Explicit rate sender should send at
Broadly
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Congestion Control
bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level
bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)
network not congested so increase sending rate
- lost segment assume loss due to congested network so decrease sending rate
TCP Congestion Control Bandwidth Probing
bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since
available bandwidth is changing depending on other connections in network)
ACKs being received so increase rate
X
X
XX
X loss so decrease rate
send
ing
rate
time
bull Q how fast to increasedecrease- details to follow
TCPrsquosldquosawtoothrdquobehavior
TCP Congestion Control detailsbull sender limits rate by limiting
number of unACKed bytes ldquoin pipelinerdquo
ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)
bull roughly
bull cwnd is dynamic function of perceived network congestion
rate = cwnd
RTT bytessec
LastByteSent-LastByteAcked cwnd
cwndbytes
RTT
ACK(s)
TCP Congestion Control more details
segment loss event reducing cwnd
bull timeout no response from receiverndash cut cwnd to 1
bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less
aggressively than on timeout
ACK received increase cwnd
bull slowstart phase - increase exponentially
fast (despite name) at connection start or following timeout
bull congestion avoidance - increase linearly
TCP Slow Start
bull when connection begins cwnd = 1 MSSndash example MSS = 500
bytes amp RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp
up to respectable ratebull increase rate exponentially
until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transitioning intoout of slowstart
ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2
ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to
congestion avoidance phase
slow start timeout
ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed
new ACKdupACKcount++
duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion
avoidance
TCP Congestion Avoidance
bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1
MSS per RTT ndash approach possible
congestion slower than in slowstart
ndash implementation cwnd = cwnd + MSScwnd for each ACK received
bull ACKs increase cwnd by 1 MSS per RTT additive increase
bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease
AIMD
AIMD Additive IncreaseMultiplicative Decrease
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
Rdt30 Stop-and-Wait Operation
first packet bit transmitted t = 0
sender
receiver
RTT
last packet bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
U sender
= 008
30008 = 000027
microseconds
L R
RTT + L R =
Pipelined Protocols
Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash Range of sequence numbers must be increasedndash Need buffering at sender andor receiver
Pipelining Increased Utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
U sender
= 024
30008 = 00008
microseconds
3 L R
RTT + L R =
Increase utilizationby a factor of 3
bull Two generic forms of pipelined protocols go-Back-N selective repeat
Pipelining Protocols
Go-back-N overviewbull sender up to N
unACKed pkts in pipeline
bull receiver only sends cumulative ACKsndash doesnrsquot ACK pkt if
therersquos a gap
bull sender has timer for oldest unACKed pktndash if timer expires
retransmit all unACKed packets
Selective Repeat overviewbull sender up to N
unACKed packets in pipeline
bull receiver ACKs individual pkts
bull sender maintains timer for each unACKed pktndash if timer expires
retransmit only unACKed packet
Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unACKed pkts allowed
bull ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)
bull Timer for each in-flight pktbull Timeout(n) retransmit pkt n and all higher seq pkts in
window
GBN Sender Extended FSM
Waitstart_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])
timeout
rdt_send(data)
if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)
base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
base=1nextseqnum=1
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
L
GBN Receiver Extended FSM
ACK-only always send ACK for correctly-received pkt with highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum
bull out-of-order pkt ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK pkt with highest in-order seq
Wait
udt_send(sndpkt)
default
rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++
expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)
L
GBN inaction
GBN Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsgo-back-nindexhtml
Selective Repeatbull Receiver individually acknowledges all
correctly received pktsndash Buffers pkts as needed for eventual in-order
delivery to upper layer
bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt
bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts
Selective Repeat sender receiver windows
Selective Repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next
unACKed seq
senderpkt n in [rcvbase rcvbase+N-
1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also
deliver buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1]
bull ACK(n)
otherwise bull ignore
receiver
Transport Layer 3-52
Selective Repeat in Action
Selective Repeat
DilemmaExample bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
SR Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Overview RFCs 793 1122 1323 2018 2581
bull full duplex datandash bi-directional data flow
in same connectionndash MSS maximum
segment size
bull connection-oriented ndash handshaking
(exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not
overwhelm receiver
bull point-to-pointndash one sender one
receiver
bull reliable in-order byte steamndash no ldquomessage
boundariesrdquo
bull pipelinedndash TCP congestion and
flow control set window size
bull send amp receive buffers
socketdoor
T C Psend buffer
T C Preceive buffer
socketdoor
segm ent
applicationwrites data
applicationreads data
TCP Segment Structure
source port dest port
32 bits
applicationdata (variable length)
sequence number
acknowledgement number
Receive window
Urg data pointerchecksumFSRPAU
headlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
TCP Seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next
byte expected from other side
ndash cumulative ACKQ how receiver
handles out-of-order segmentsndash A TCP spec
doesnrsquot say up to implementer
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
UsertypeslsquoCrsquo
host ACKsreceipt of echoedlsquoCrsquo
timesimple telnet scenario
host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTT
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTTbull SampleRTT measured time
from segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
bull Exponential weighted moving averagebull influence of past sample decreases exponentially
fastbull typical value = 18th (or 0125)
Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety margin
bull First estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP reliable data transfer
bull TCP creates rdt service on top of IPrsquos unreliable service
bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs
bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control
TCP Sender EventsData rcvd from appbull Create segment with
seq bull seq is byte-stream
number of first data byte in segment
bull Start timer if not already running (think of timer as for oldest unACKed segment)
bull Expiration interval TimeOutInterval
Timeoutbull retransmit segment
that caused timeoutbull restart timer ACK rcvdbull If acknowledges
previously unACKed segmentsndash update what is known
to be ACKedndash start timer if there are
outstanding segments
TCP Sender
(simplified)
NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71
y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed
TCP Retransmission Scenarios
Host A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq
=92
tim
eout
ACK=120
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq
=92
tim
eout
SendBase= 100
SendBase= 120
SendBase= 120
Sendbase= 100
TCP Retransmission Scenarios (more)
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
Cumulative ACK scenario
Host B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
Fast Retransmit
bull Time-out period often relatively longndash Long delay before
resending lost packet
bull Detect lost segments via duplicate ACKsndash Sender often sends
many segments back-to-back
ndash If segment lost there will likely be many duplicate ACKs for that segment
bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
Host A
timeo
ut
Host B
time
X
resend seq X2
seq x1seq x2seq x3seq x4seq x5
ACK x1
ACK x1ACK x1ACK x1
tripleduplicate
ACKs
Fast
Retra
nsm
it
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Flow Control
bull Receive side of TCP connection has a receive buffer
bull speed-matching service matching send rate to receiving applicationrsquos drain rate
bull App process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast
flow control
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
TCP Flow Control How it Works
(suppose TCP receiver discards out-of-order segments)
bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Receiver advertises unused buffer space by including rwnd value in segment header
bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos
buffer doesnrsquot overflow
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
rwndRcvBuffer
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control
info (eg RcvWindow)bull client connection
initiator Socket clientSocket = new
Socket(ldquohostnamerdquo port) bull server contacted by
client Socket connectionSocket =
welcomeSocketaccept()
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial
seq Step 3 client receives
SYNACK replies with ACK segment which may contain data
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer 3-81
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Principles of Congestion Control
Congestionbull Informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull Different from flow controlbull Manifestations
ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)
bull A ldquotop-10rdquo problem
Causescosts of Congestion Scenario 1
bull Two senders two receivers
bull One router infinite buffers
bull No retransmission
bull Large delays when congested
bull Maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
Causescosts of Congestion Scenario 2
bull One router finite buffers bull Sender retransmission of lost packet
finite shared output link
buffers
Host A
lin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of congestion Scenario 2
bull Always (goodput)
bull ldquoPerfectrdquo retransmission only when loss
bull Retransmission of delayed (not lost) packet makes
larger (than perfect case) for same
lin
lout
=
lin
lout
gtl
inl
out
ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of
pkt
R2
R2lin
l ou
t
b
R2
R2lin
l ou
t
a
R2
R2lin
l ou
t
c
R4
R3
Causescosts of Congestion Scenario 3
bull Four sendersbull Multihop pathsbull Timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of Congestion Scenario 3
Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Host A
Host B
lo
u
t
Approaches towards congestion control
End-end congestion control
bull No explicit feedback from network
bull Congestion inferred from end-system observed loss delay
bull Approach taken by TCP
Network-assisted congestion control
bull Routers provide feedback to end systemsndash Single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash Explicit rate sender should send at
Broadly
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Congestion Control
bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level
bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)
network not congested so increase sending rate
- lost segment assume loss due to congested network so decrease sending rate
TCP Congestion Control Bandwidth Probing
bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since
available bandwidth is changing depending on other connections in network)
ACKs being received so increase rate
X
X
XX
X loss so decrease rate
send
ing
rate
time
bull Q how fast to increasedecrease- details to follow
TCPrsquosldquosawtoothrdquobehavior
TCP Congestion Control detailsbull sender limits rate by limiting
number of unACKed bytes ldquoin pipelinerdquo
ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)
bull roughly
bull cwnd is dynamic function of perceived network congestion
rate = cwnd
RTT bytessec
LastByteSent-LastByteAcked cwnd
cwndbytes
RTT
ACK(s)
TCP Congestion Control more details
segment loss event reducing cwnd
bull timeout no response from receiverndash cut cwnd to 1
bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less
aggressively than on timeout
ACK received increase cwnd
bull slowstart phase - increase exponentially
fast (despite name) at connection start or following timeout
bull congestion avoidance - increase linearly
TCP Slow Start
bull when connection begins cwnd = 1 MSSndash example MSS = 500
bytes amp RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp
up to respectable ratebull increase rate exponentially
until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transitioning intoout of slowstart
ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2
ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to
congestion avoidance phase
slow start timeout
ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed
new ACKdupACKcount++
duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion
avoidance
TCP Congestion Avoidance
bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1
MSS per RTT ndash approach possible
congestion slower than in slowstart
ndash implementation cwnd = cwnd + MSScwnd for each ACK received
bull ACKs increase cwnd by 1 MSS per RTT additive increase
bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease
AIMD
AIMD Additive IncreaseMultiplicative Decrease
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
Pipelined Protocols
Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash Range of sequence numbers must be increasedndash Need buffering at sender andor receiver
Pipelining Increased Utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
U sender
= 024
30008 = 00008
microseconds
3 L R
RTT + L R =
Increase utilizationby a factor of 3
bull Two generic forms of pipelined protocols go-Back-N selective repeat
Pipelining Protocols
Go-back-N overviewbull sender up to N
unACKed pkts in pipeline
bull receiver only sends cumulative ACKsndash doesnrsquot ACK pkt if
therersquos a gap
bull sender has timer for oldest unACKed pktndash if timer expires
retransmit all unACKed packets
Selective Repeat overviewbull sender up to N
unACKed packets in pipeline
bull receiver ACKs individual pkts
bull sender maintains timer for each unACKed pktndash if timer expires
retransmit only unACKed packet
Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unACKed pkts allowed
bull ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)
bull Timer for each in-flight pktbull Timeout(n) retransmit pkt n and all higher seq pkts in
window
GBN Sender Extended FSM
Waitstart_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])
timeout
rdt_send(data)
if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)
base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
base=1nextseqnum=1
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
L
GBN Receiver Extended FSM
ACK-only always send ACK for correctly-received pkt with highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum
bull out-of-order pkt ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK pkt with highest in-order seq
Wait
udt_send(sndpkt)
default
rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++
expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)
L
GBN inaction
GBN Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsgo-back-nindexhtml
Selective Repeatbull Receiver individually acknowledges all
correctly received pktsndash Buffers pkts as needed for eventual in-order
delivery to upper layer
bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt
bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts
Selective Repeat sender receiver windows
Selective Repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next
unACKed seq
senderpkt n in [rcvbase rcvbase+N-
1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also
deliver buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1]
bull ACK(n)
otherwise bull ignore
receiver
Transport Layer 3-52
Selective Repeat in Action
Selective Repeat
DilemmaExample bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
SR Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Overview RFCs 793 1122 1323 2018 2581
bull full duplex datandash bi-directional data flow
in same connectionndash MSS maximum
segment size
bull connection-oriented ndash handshaking
(exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not
overwhelm receiver
bull point-to-pointndash one sender one
receiver
bull reliable in-order byte steamndash no ldquomessage
boundariesrdquo
bull pipelinedndash TCP congestion and
flow control set window size
bull send amp receive buffers
socketdoor
T C Psend buffer
T C Preceive buffer
socketdoor
segm ent
applicationwrites data
applicationreads data
TCP Segment Structure
source port dest port
32 bits
applicationdata (variable length)
sequence number
acknowledgement number
Receive window
Urg data pointerchecksumFSRPAU
headlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
TCP Seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next
byte expected from other side
ndash cumulative ACKQ how receiver
handles out-of-order segmentsndash A TCP spec
doesnrsquot say up to implementer
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
UsertypeslsquoCrsquo
host ACKsreceipt of echoedlsquoCrsquo
timesimple telnet scenario
host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTT
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTTbull SampleRTT measured time
from segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
bull Exponential weighted moving averagebull influence of past sample decreases exponentially
fastbull typical value = 18th (or 0125)
Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety margin
bull First estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP reliable data transfer
bull TCP creates rdt service on top of IPrsquos unreliable service
bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs
bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control
TCP Sender EventsData rcvd from appbull Create segment with
seq bull seq is byte-stream
number of first data byte in segment
bull Start timer if not already running (think of timer as for oldest unACKed segment)
bull Expiration interval TimeOutInterval
Timeoutbull retransmit segment
that caused timeoutbull restart timer ACK rcvdbull If acknowledges
previously unACKed segmentsndash update what is known
to be ACKedndash start timer if there are
outstanding segments
TCP Sender
(simplified)
NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71
y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed
TCP Retransmission Scenarios
Host A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq
=92
tim
eout
ACK=120
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq
=92
tim
eout
SendBase= 100
SendBase= 120
SendBase= 120
Sendbase= 100
TCP Retransmission Scenarios (more)
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
Cumulative ACK scenario
Host B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
Fast Retransmit
bull Time-out period often relatively longndash Long delay before
resending lost packet
bull Detect lost segments via duplicate ACKsndash Sender often sends
many segments back-to-back
ndash If segment lost there will likely be many duplicate ACKs for that segment
bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
Host A
timeo
ut
Host B
time
X
resend seq X2
seq x1seq x2seq x3seq x4seq x5
ACK x1
ACK x1ACK x1ACK x1
tripleduplicate
ACKs
Fast
Retra
nsm
it
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Flow Control
bull Receive side of TCP connection has a receive buffer
bull speed-matching service matching send rate to receiving applicationrsquos drain rate
bull App process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast
flow control
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
TCP Flow Control How it Works
(suppose TCP receiver discards out-of-order segments)
bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Receiver advertises unused buffer space by including rwnd value in segment header
bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos
buffer doesnrsquot overflow
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
rwndRcvBuffer
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control
info (eg RcvWindow)bull client connection
initiator Socket clientSocket = new
Socket(ldquohostnamerdquo port) bull server contacted by
client Socket connectionSocket =
welcomeSocketaccept()
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial
seq Step 3 client receives
SYNACK replies with ACK segment which may contain data
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer 3-81
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Principles of Congestion Control
Congestionbull Informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull Different from flow controlbull Manifestations
ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)
bull A ldquotop-10rdquo problem
Causescosts of Congestion Scenario 1
bull Two senders two receivers
bull One router infinite buffers
bull No retransmission
bull Large delays when congested
bull Maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
Causescosts of Congestion Scenario 2
bull One router finite buffers bull Sender retransmission of lost packet
finite shared output link
buffers
Host A
lin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of congestion Scenario 2
bull Always (goodput)
bull ldquoPerfectrdquo retransmission only when loss
bull Retransmission of delayed (not lost) packet makes
larger (than perfect case) for same
lin
lout
=
lin
lout
gtl
inl
out
ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of
pkt
R2
R2lin
l ou
t
b
R2
R2lin
l ou
t
a
R2
R2lin
l ou
t
c
R4
R3
Causescosts of Congestion Scenario 3
bull Four sendersbull Multihop pathsbull Timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of Congestion Scenario 3
Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Host A
Host B
lo
u
t
Approaches towards congestion control
End-end congestion control
bull No explicit feedback from network
bull Congestion inferred from end-system observed loss delay
bull Approach taken by TCP
Network-assisted congestion control
bull Routers provide feedback to end systemsndash Single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash Explicit rate sender should send at
Broadly
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Congestion Control
bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level
bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)
network not congested so increase sending rate
- lost segment assume loss due to congested network so decrease sending rate
TCP Congestion Control Bandwidth Probing
bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since
available bandwidth is changing depending on other connections in network)
ACKs being received so increase rate
X
X
XX
X loss so decrease rate
send
ing
rate
time
bull Q how fast to increasedecrease- details to follow
TCPrsquosldquosawtoothrdquobehavior
TCP Congestion Control detailsbull sender limits rate by limiting
number of unACKed bytes ldquoin pipelinerdquo
ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)
bull roughly
bull cwnd is dynamic function of perceived network congestion
rate = cwnd
RTT bytessec
LastByteSent-LastByteAcked cwnd
cwndbytes
RTT
ACK(s)
TCP Congestion Control more details
segment loss event reducing cwnd
bull timeout no response from receiverndash cut cwnd to 1
bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less
aggressively than on timeout
ACK received increase cwnd
bull slowstart phase - increase exponentially
fast (despite name) at connection start or following timeout
bull congestion avoidance - increase linearly
TCP Slow Start
bull when connection begins cwnd = 1 MSSndash example MSS = 500
bytes amp RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp
up to respectable ratebull increase rate exponentially
until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transitioning intoout of slowstart
ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2
ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to
congestion avoidance phase
slow start timeout
ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed
new ACKdupACKcount++
duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion
avoidance
TCP Congestion Avoidance
bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1
MSS per RTT ndash approach possible
congestion slower than in slowstart
ndash implementation cwnd = cwnd + MSScwnd for each ACK received
bull ACKs increase cwnd by 1 MSS per RTT additive increase
bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease
AIMD
AIMD Additive IncreaseMultiplicative Decrease
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
Pipelining Increased Utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
U sender
= 024
30008 = 00008
microseconds
3 L R
RTT + L R =
Increase utilizationby a factor of 3
bull Two generic forms of pipelined protocols go-Back-N selective repeat
Pipelining Protocols
Go-back-N overviewbull sender up to N
unACKed pkts in pipeline
bull receiver only sends cumulative ACKsndash doesnrsquot ACK pkt if
therersquos a gap
bull sender has timer for oldest unACKed pktndash if timer expires
retransmit all unACKed packets
Selective Repeat overviewbull sender up to N
unACKed packets in pipeline
bull receiver ACKs individual pkts
bull sender maintains timer for each unACKed pktndash if timer expires
retransmit only unACKed packet
Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unACKed pkts allowed
bull ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)
bull Timer for each in-flight pktbull Timeout(n) retransmit pkt n and all higher seq pkts in
window
GBN Sender Extended FSM
Waitstart_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])
timeout
rdt_send(data)
if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)
base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
base=1nextseqnum=1
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
L
GBN Receiver Extended FSM
ACK-only always send ACK for correctly-received pkt with highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum
bull out-of-order pkt ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK pkt with highest in-order seq
Wait
udt_send(sndpkt)
default
rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++
expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)
L
GBN inaction
GBN Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsgo-back-nindexhtml
Selective Repeatbull Receiver individually acknowledges all
correctly received pktsndash Buffers pkts as needed for eventual in-order
delivery to upper layer
bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt
bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts
Selective Repeat sender receiver windows
Selective Repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next
unACKed seq
senderpkt n in [rcvbase rcvbase+N-
1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also
deliver buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1]
bull ACK(n)
otherwise bull ignore
receiver
Transport Layer 3-52
Selective Repeat in Action
Selective Repeat
DilemmaExample bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
SR Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Overview RFCs 793 1122 1323 2018 2581
bull full duplex datandash bi-directional data flow
in same connectionndash MSS maximum
segment size
bull connection-oriented ndash handshaking
(exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not
overwhelm receiver
bull point-to-pointndash one sender one
receiver
bull reliable in-order byte steamndash no ldquomessage
boundariesrdquo
bull pipelinedndash TCP congestion and
flow control set window size
bull send amp receive buffers
socketdoor
T C Psend buffer
T C Preceive buffer
socketdoor
segm ent
applicationwrites data
applicationreads data
TCP Segment Structure
source port dest port
32 bits
applicationdata (variable length)
sequence number
acknowledgement number
Receive window
Urg data pointerchecksumFSRPAU
headlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
TCP Seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next
byte expected from other side
ndash cumulative ACKQ how receiver
handles out-of-order segmentsndash A TCP spec
doesnrsquot say up to implementer
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
UsertypeslsquoCrsquo
host ACKsreceipt of echoedlsquoCrsquo
timesimple telnet scenario
host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTT
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTTbull SampleRTT measured time
from segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
bull Exponential weighted moving averagebull influence of past sample decreases exponentially
fastbull typical value = 18th (or 0125)
Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety margin
bull First estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP reliable data transfer
bull TCP creates rdt service on top of IPrsquos unreliable service
bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs
bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control
TCP Sender EventsData rcvd from appbull Create segment with
seq bull seq is byte-stream
number of first data byte in segment
bull Start timer if not already running (think of timer as for oldest unACKed segment)
bull Expiration interval TimeOutInterval
Timeoutbull retransmit segment
that caused timeoutbull restart timer ACK rcvdbull If acknowledges
previously unACKed segmentsndash update what is known
to be ACKedndash start timer if there are
outstanding segments
TCP Sender
(simplified)
NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71
y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed
TCP Retransmission Scenarios
Host A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq
=92
tim
eout
ACK=120
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq
=92
tim
eout
SendBase= 100
SendBase= 120
SendBase= 120
Sendbase= 100
TCP Retransmission Scenarios (more)
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
Cumulative ACK scenario
Host B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
Fast Retransmit
bull Time-out period often relatively longndash Long delay before
resending lost packet
bull Detect lost segments via duplicate ACKsndash Sender often sends
many segments back-to-back
ndash If segment lost there will likely be many duplicate ACKs for that segment
bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
Host A
timeo
ut
Host B
time
X
resend seq X2
seq x1seq x2seq x3seq x4seq x5
ACK x1
ACK x1ACK x1ACK x1
tripleduplicate
ACKs
Fast
Retra
nsm
it
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Flow Control
bull Receive side of TCP connection has a receive buffer
bull speed-matching service matching send rate to receiving applicationrsquos drain rate
bull App process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast
flow control
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
TCP Flow Control How it Works
(suppose TCP receiver discards out-of-order segments)
bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Receiver advertises unused buffer space by including rwnd value in segment header
bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos
buffer doesnrsquot overflow
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
rwndRcvBuffer
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control
info (eg RcvWindow)bull client connection
initiator Socket clientSocket = new
Socket(ldquohostnamerdquo port) bull server contacted by
client Socket connectionSocket =
welcomeSocketaccept()
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial
seq Step 3 client receives
SYNACK replies with ACK segment which may contain data
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer 3-81
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Principles of Congestion Control
Congestionbull Informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull Different from flow controlbull Manifestations
ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)
bull A ldquotop-10rdquo problem
Causescosts of Congestion Scenario 1
bull Two senders two receivers
bull One router infinite buffers
bull No retransmission
bull Large delays when congested
bull Maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
Causescosts of Congestion Scenario 2
bull One router finite buffers bull Sender retransmission of lost packet
finite shared output link
buffers
Host A
lin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of congestion Scenario 2
bull Always (goodput)
bull ldquoPerfectrdquo retransmission only when loss
bull Retransmission of delayed (not lost) packet makes
larger (than perfect case) for same
lin
lout
=
lin
lout
gtl
inl
out
ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of
pkt
R2
R2lin
l ou
t
b
R2
R2lin
l ou
t
a
R2
R2lin
l ou
t
c
R4
R3
Causescosts of Congestion Scenario 3
bull Four sendersbull Multihop pathsbull Timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of Congestion Scenario 3
Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Host A
Host B
lo
u
t
Approaches towards congestion control
End-end congestion control
bull No explicit feedback from network
bull Congestion inferred from end-system observed loss delay
bull Approach taken by TCP
Network-assisted congestion control
bull Routers provide feedback to end systemsndash Single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash Explicit rate sender should send at
Broadly
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Congestion Control
bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level
bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)
network not congested so increase sending rate
- lost segment assume loss due to congested network so decrease sending rate
TCP Congestion Control Bandwidth Probing
bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since
available bandwidth is changing depending on other connections in network)
ACKs being received so increase rate
X
X
XX
X loss so decrease rate
send
ing
rate
time
bull Q how fast to increasedecrease- details to follow
TCPrsquosldquosawtoothrdquobehavior
TCP Congestion Control detailsbull sender limits rate by limiting
number of unACKed bytes ldquoin pipelinerdquo
ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)
bull roughly
bull cwnd is dynamic function of perceived network congestion
rate = cwnd
RTT bytessec
LastByteSent-LastByteAcked cwnd
cwndbytes
RTT
ACK(s)
TCP Congestion Control more details
segment loss event reducing cwnd
bull timeout no response from receiverndash cut cwnd to 1
bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less
aggressively than on timeout
ACK received increase cwnd
bull slowstart phase - increase exponentially
fast (despite name) at connection start or following timeout
bull congestion avoidance - increase linearly
TCP Slow Start
bull when connection begins cwnd = 1 MSSndash example MSS = 500
bytes amp RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp
up to respectable ratebull increase rate exponentially
until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transitioning intoout of slowstart
ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2
ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to
congestion avoidance phase
slow start timeout
ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed
new ACKdupACKcount++
duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion
avoidance
TCP Congestion Avoidance
bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1
MSS per RTT ndash approach possible
congestion slower than in slowstart
ndash implementation cwnd = cwnd + MSScwnd for each ACK received
bull ACKs increase cwnd by 1 MSS per RTT additive increase
bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease
AIMD
AIMD Additive IncreaseMultiplicative Decrease
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
Pipelining Protocols
Go-back-N overviewbull sender up to N
unACKed pkts in pipeline
bull receiver only sends cumulative ACKsndash doesnrsquot ACK pkt if
therersquos a gap
bull sender has timer for oldest unACKed pktndash if timer expires
retransmit all unACKed packets
Selective Repeat overviewbull sender up to N
unACKed packets in pipeline
bull receiver ACKs individual pkts
bull sender maintains timer for each unACKed pktndash if timer expires
retransmit only unACKed packet
Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unACKed pkts allowed
bull ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)
bull Timer for each in-flight pktbull Timeout(n) retransmit pkt n and all higher seq pkts in
window
GBN Sender Extended FSM
Waitstart_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])
timeout
rdt_send(data)
if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)
base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
base=1nextseqnum=1
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
L
GBN Receiver Extended FSM
ACK-only always send ACK for correctly-received pkt with highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum
bull out-of-order pkt ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK pkt with highest in-order seq
Wait
udt_send(sndpkt)
default
rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++
expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)
L
GBN inaction
GBN Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsgo-back-nindexhtml
Selective Repeatbull Receiver individually acknowledges all
correctly received pktsndash Buffers pkts as needed for eventual in-order
delivery to upper layer
bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt
bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts
Selective Repeat sender receiver windows
Selective Repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next
unACKed seq
senderpkt n in [rcvbase rcvbase+N-
1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also
deliver buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1]
bull ACK(n)
otherwise bull ignore
receiver
Transport Layer 3-52
Selective Repeat in Action
Selective Repeat
DilemmaExample bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
SR Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Overview RFCs 793 1122 1323 2018 2581
bull full duplex datandash bi-directional data flow
in same connectionndash MSS maximum
segment size
bull connection-oriented ndash handshaking
(exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not
overwhelm receiver
bull point-to-pointndash one sender one
receiver
bull reliable in-order byte steamndash no ldquomessage
boundariesrdquo
bull pipelinedndash TCP congestion and
flow control set window size
bull send amp receive buffers
socketdoor
T C Psend buffer
T C Preceive buffer
socketdoor
segm ent
applicationwrites data
applicationreads data
TCP Segment Structure
source port dest port
32 bits
applicationdata (variable length)
sequence number
acknowledgement number
Receive window
Urg data pointerchecksumFSRPAU
headlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
TCP Seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next
byte expected from other side
ndash cumulative ACKQ how receiver
handles out-of-order segmentsndash A TCP spec
doesnrsquot say up to implementer
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
UsertypeslsquoCrsquo
host ACKsreceipt of echoedlsquoCrsquo
timesimple telnet scenario
host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTT
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTTbull SampleRTT measured time
from segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
bull Exponential weighted moving averagebull influence of past sample decreases exponentially
fastbull typical value = 18th (or 0125)
Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety margin
bull First estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP reliable data transfer
bull TCP creates rdt service on top of IPrsquos unreliable service
bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs
bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control
TCP Sender EventsData rcvd from appbull Create segment with
seq bull seq is byte-stream
number of first data byte in segment
bull Start timer if not already running (think of timer as for oldest unACKed segment)
bull Expiration interval TimeOutInterval
Timeoutbull retransmit segment
that caused timeoutbull restart timer ACK rcvdbull If acknowledges
previously unACKed segmentsndash update what is known
to be ACKedndash start timer if there are
outstanding segments
TCP Sender
(simplified)
NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71
y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed
TCP Retransmission Scenarios
Host A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq
=92
tim
eout
ACK=120
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq
=92
tim
eout
SendBase= 100
SendBase= 120
SendBase= 120
Sendbase= 100
TCP Retransmission Scenarios (more)
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
Cumulative ACK scenario
Host B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
Fast Retransmit
bull Time-out period often relatively longndash Long delay before
resending lost packet
bull Detect lost segments via duplicate ACKsndash Sender often sends
many segments back-to-back
ndash If segment lost there will likely be many duplicate ACKs for that segment
bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
Host A
timeo
ut
Host B
time
X
resend seq X2
seq x1seq x2seq x3seq x4seq x5
ACK x1
ACK x1ACK x1ACK x1
tripleduplicate
ACKs
Fast
Retra
nsm
it
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Flow Control
bull Receive side of TCP connection has a receive buffer
bull speed-matching service matching send rate to receiving applicationrsquos drain rate
bull App process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast
flow control
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
TCP Flow Control How it Works
(suppose TCP receiver discards out-of-order segments)
bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Receiver advertises unused buffer space by including rwnd value in segment header
bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos
buffer doesnrsquot overflow
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
rwndRcvBuffer
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control
info (eg RcvWindow)bull client connection
initiator Socket clientSocket = new
Socket(ldquohostnamerdquo port) bull server contacted by
client Socket connectionSocket =
welcomeSocketaccept()
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial
seq Step 3 client receives
SYNACK replies with ACK segment which may contain data
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer 3-81
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Principles of Congestion Control
Congestionbull Informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull Different from flow controlbull Manifestations
ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)
bull A ldquotop-10rdquo problem
Causescosts of Congestion Scenario 1
bull Two senders two receivers
bull One router infinite buffers
bull No retransmission
bull Large delays when congested
bull Maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
Causescosts of Congestion Scenario 2
bull One router finite buffers bull Sender retransmission of lost packet
finite shared output link
buffers
Host A
lin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of congestion Scenario 2
bull Always (goodput)
bull ldquoPerfectrdquo retransmission only when loss
bull Retransmission of delayed (not lost) packet makes
larger (than perfect case) for same
lin
lout
=
lin
lout
gtl
inl
out
ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of
pkt
R2
R2lin
l ou
t
b
R2
R2lin
l ou
t
a
R2
R2lin
l ou
t
c
R4
R3
Causescosts of Congestion Scenario 3
bull Four sendersbull Multihop pathsbull Timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of Congestion Scenario 3
Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Host A
Host B
lo
u
t
Approaches towards congestion control
End-end congestion control
bull No explicit feedback from network
bull Congestion inferred from end-system observed loss delay
bull Approach taken by TCP
Network-assisted congestion control
bull Routers provide feedback to end systemsndash Single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash Explicit rate sender should send at
Broadly
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Congestion Control
bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level
bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)
network not congested so increase sending rate
- lost segment assume loss due to congested network so decrease sending rate
TCP Congestion Control Bandwidth Probing
bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since
available bandwidth is changing depending on other connections in network)
ACKs being received so increase rate
X
X
XX
X loss so decrease rate
send
ing
rate
time
bull Q how fast to increasedecrease- details to follow
TCPrsquosldquosawtoothrdquobehavior
TCP Congestion Control detailsbull sender limits rate by limiting
number of unACKed bytes ldquoin pipelinerdquo
ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)
bull roughly
bull cwnd is dynamic function of perceived network congestion
rate = cwnd
RTT bytessec
LastByteSent-LastByteAcked cwnd
cwndbytes
RTT
ACK(s)
TCP Congestion Control more details
segment loss event reducing cwnd
bull timeout no response from receiverndash cut cwnd to 1
bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less
aggressively than on timeout
ACK received increase cwnd
bull slowstart phase - increase exponentially
fast (despite name) at connection start or following timeout
bull congestion avoidance - increase linearly
TCP Slow Start
bull when connection begins cwnd = 1 MSSndash example MSS = 500
bytes amp RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp
up to respectable ratebull increase rate exponentially
until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transitioning intoout of slowstart
ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2
ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to
congestion avoidance phase
slow start timeout
ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed
new ACKdupACKcount++
duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion
avoidance
TCP Congestion Avoidance
bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1
MSS per RTT ndash approach possible
congestion slower than in slowstart
ndash implementation cwnd = cwnd + MSScwnd for each ACK received
bull ACKs increase cwnd by 1 MSS per RTT additive increase
bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease
AIMD
AIMD Additive IncreaseMultiplicative Decrease
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unACKed pkts allowed
bull ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)
bull Timer for each in-flight pktbull Timeout(n) retransmit pkt n and all higher seq pkts in
window
GBN Sender Extended FSM
Waitstart_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])
timeout
rdt_send(data)
if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)
base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
base=1nextseqnum=1
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
L
GBN Receiver Extended FSM
ACK-only always send ACK for correctly-received pkt with highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum
bull out-of-order pkt ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK pkt with highest in-order seq
Wait
udt_send(sndpkt)
default
rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++
expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)
L
GBN inaction
GBN Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsgo-back-nindexhtml
Selective Repeatbull Receiver individually acknowledges all
correctly received pktsndash Buffers pkts as needed for eventual in-order
delivery to upper layer
bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt
bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts
Selective Repeat sender receiver windows
Selective Repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next
unACKed seq
senderpkt n in [rcvbase rcvbase+N-
1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also
deliver buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1]
bull ACK(n)
otherwise bull ignore
receiver
Transport Layer 3-52
Selective Repeat in Action
Selective Repeat
DilemmaExample bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
SR Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Overview RFCs 793 1122 1323 2018 2581
bull full duplex datandash bi-directional data flow
in same connectionndash MSS maximum
segment size
bull connection-oriented ndash handshaking
(exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not
overwhelm receiver
bull point-to-pointndash one sender one
receiver
bull reliable in-order byte steamndash no ldquomessage
boundariesrdquo
bull pipelinedndash TCP congestion and
flow control set window size
bull send amp receive buffers
socketdoor
T C Psend buffer
T C Preceive buffer
socketdoor
segm ent
applicationwrites data
applicationreads data
TCP Segment Structure
source port dest port
32 bits
applicationdata (variable length)
sequence number
acknowledgement number
Receive window
Urg data pointerchecksumFSRPAU
headlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
TCP Seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next
byte expected from other side
ndash cumulative ACKQ how receiver
handles out-of-order segmentsndash A TCP spec
doesnrsquot say up to implementer
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
UsertypeslsquoCrsquo
host ACKsreceipt of echoedlsquoCrsquo
timesimple telnet scenario
host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTT
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTTbull SampleRTT measured time
from segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
bull Exponential weighted moving averagebull influence of past sample decreases exponentially
fastbull typical value = 18th (or 0125)
Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety margin
bull First estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP reliable data transfer
bull TCP creates rdt service on top of IPrsquos unreliable service
bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs
bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control
TCP Sender EventsData rcvd from appbull Create segment with
seq bull seq is byte-stream
number of first data byte in segment
bull Start timer if not already running (think of timer as for oldest unACKed segment)
bull Expiration interval TimeOutInterval
Timeoutbull retransmit segment
that caused timeoutbull restart timer ACK rcvdbull If acknowledges
previously unACKed segmentsndash update what is known
to be ACKedndash start timer if there are
outstanding segments
TCP Sender
(simplified)
NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71
y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed
TCP Retransmission Scenarios
Host A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq
=92
tim
eout
ACK=120
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq
=92
tim
eout
SendBase= 100
SendBase= 120
SendBase= 120
Sendbase= 100
TCP Retransmission Scenarios (more)
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
Cumulative ACK scenario
Host B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
Fast Retransmit
bull Time-out period often relatively longndash Long delay before
resending lost packet
bull Detect lost segments via duplicate ACKsndash Sender often sends
many segments back-to-back
ndash If segment lost there will likely be many duplicate ACKs for that segment
bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
Host A
timeo
ut
Host B
time
X
resend seq X2
seq x1seq x2seq x3seq x4seq x5
ACK x1
ACK x1ACK x1ACK x1
tripleduplicate
ACKs
Fast
Retra
nsm
it
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Flow Control
bull Receive side of TCP connection has a receive buffer
bull speed-matching service matching send rate to receiving applicationrsquos drain rate
bull App process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast
flow control
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
TCP Flow Control How it Works
(suppose TCP receiver discards out-of-order segments)
bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Receiver advertises unused buffer space by including rwnd value in segment header
bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos
buffer doesnrsquot overflow
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
rwndRcvBuffer
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control
info (eg RcvWindow)bull client connection
initiator Socket clientSocket = new
Socket(ldquohostnamerdquo port) bull server contacted by
client Socket connectionSocket =
welcomeSocketaccept()
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial
seq Step 3 client receives
SYNACK replies with ACK segment which may contain data
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer 3-81
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Principles of Congestion Control
Congestionbull Informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull Different from flow controlbull Manifestations
ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)
bull A ldquotop-10rdquo problem
Causescosts of Congestion Scenario 1
bull Two senders two receivers
bull One router infinite buffers
bull No retransmission
bull Large delays when congested
bull Maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
Causescosts of Congestion Scenario 2
bull One router finite buffers bull Sender retransmission of lost packet
finite shared output link
buffers
Host A
lin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of congestion Scenario 2
bull Always (goodput)
bull ldquoPerfectrdquo retransmission only when loss
bull Retransmission of delayed (not lost) packet makes
larger (than perfect case) for same
lin
lout
=
lin
lout
gtl
inl
out
ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of
pkt
R2
R2lin
l ou
t
b
R2
R2lin
l ou
t
a
R2
R2lin
l ou
t
c
R4
R3
Causescosts of Congestion Scenario 3
bull Four sendersbull Multihop pathsbull Timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of Congestion Scenario 3
Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Host A
Host B
lo
u
t
Approaches towards congestion control
End-end congestion control
bull No explicit feedback from network
bull Congestion inferred from end-system observed loss delay
bull Approach taken by TCP
Network-assisted congestion control
bull Routers provide feedback to end systemsndash Single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash Explicit rate sender should send at
Broadly
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Congestion Control
bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level
bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)
network not congested so increase sending rate
- lost segment assume loss due to congested network so decrease sending rate
TCP Congestion Control Bandwidth Probing
bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since
available bandwidth is changing depending on other connections in network)
ACKs being received so increase rate
X
X
XX
X loss so decrease rate
send
ing
rate
time
bull Q how fast to increasedecrease- details to follow
TCPrsquosldquosawtoothrdquobehavior
TCP Congestion Control detailsbull sender limits rate by limiting
number of unACKed bytes ldquoin pipelinerdquo
ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)
bull roughly
bull cwnd is dynamic function of perceived network congestion
rate = cwnd
RTT bytessec
LastByteSent-LastByteAcked cwnd
cwndbytes
RTT
ACK(s)
TCP Congestion Control more details
segment loss event reducing cwnd
bull timeout no response from receiverndash cut cwnd to 1
bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less
aggressively than on timeout
ACK received increase cwnd
bull slowstart phase - increase exponentially
fast (despite name) at connection start or following timeout
bull congestion avoidance - increase linearly
TCP Slow Start
bull when connection begins cwnd = 1 MSSndash example MSS = 500
bytes amp RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp
up to respectable ratebull increase rate exponentially
until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transitioning intoout of slowstart
ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2
ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to
congestion avoidance phase
slow start timeout
ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed
new ACKdupACKcount++
duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion
avoidance
TCP Congestion Avoidance
bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1
MSS per RTT ndash approach possible
congestion slower than in slowstart
ndash implementation cwnd = cwnd + MSScwnd for each ACK received
bull ACKs increase cwnd by 1 MSS per RTT additive increase
bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease
AIMD
AIMD Additive IncreaseMultiplicative Decrease
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
GBN Sender Extended FSM
Waitstart_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])
timeout
rdt_send(data)
if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)
base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
base=1nextseqnum=1
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
L
GBN Receiver Extended FSM
ACK-only always send ACK for correctly-received pkt with highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum
bull out-of-order pkt ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK pkt with highest in-order seq
Wait
udt_send(sndpkt)
default
rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++
expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)
L
GBN inaction
GBN Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsgo-back-nindexhtml
Selective Repeatbull Receiver individually acknowledges all
correctly received pktsndash Buffers pkts as needed for eventual in-order
delivery to upper layer
bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt
bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts
Selective Repeat sender receiver windows
Selective Repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next
unACKed seq
senderpkt n in [rcvbase rcvbase+N-
1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also
deliver buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1]
bull ACK(n)
otherwise bull ignore
receiver
Transport Layer 3-52
Selective Repeat in Action
Selective Repeat
DilemmaExample bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
SR Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Overview RFCs 793 1122 1323 2018 2581
bull full duplex datandash bi-directional data flow
in same connectionndash MSS maximum
segment size
bull connection-oriented ndash handshaking
(exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not
overwhelm receiver
bull point-to-pointndash one sender one
receiver
bull reliable in-order byte steamndash no ldquomessage
boundariesrdquo
bull pipelinedndash TCP congestion and
flow control set window size
bull send amp receive buffers
socketdoor
T C Psend buffer
T C Preceive buffer
socketdoor
segm ent
applicationwrites data
applicationreads data
TCP Segment Structure
source port dest port
32 bits
applicationdata (variable length)
sequence number
acknowledgement number
Receive window
Urg data pointerchecksumFSRPAU
headlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
TCP Seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next
byte expected from other side
ndash cumulative ACKQ how receiver
handles out-of-order segmentsndash A TCP spec
doesnrsquot say up to implementer
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
UsertypeslsquoCrsquo
host ACKsreceipt of echoedlsquoCrsquo
timesimple telnet scenario
host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTT
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTTbull SampleRTT measured time
from segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
bull Exponential weighted moving averagebull influence of past sample decreases exponentially
fastbull typical value = 18th (or 0125)
Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety margin
bull First estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP reliable data transfer
bull TCP creates rdt service on top of IPrsquos unreliable service
bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs
bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control
TCP Sender EventsData rcvd from appbull Create segment with
seq bull seq is byte-stream
number of first data byte in segment
bull Start timer if not already running (think of timer as for oldest unACKed segment)
bull Expiration interval TimeOutInterval
Timeoutbull retransmit segment
that caused timeoutbull restart timer ACK rcvdbull If acknowledges
previously unACKed segmentsndash update what is known
to be ACKedndash start timer if there are
outstanding segments
TCP Sender
(simplified)
NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71
y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed
TCP Retransmission Scenarios
Host A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq
=92
tim
eout
ACK=120
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq
=92
tim
eout
SendBase= 100
SendBase= 120
SendBase= 120
Sendbase= 100
TCP Retransmission Scenarios (more)
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
Cumulative ACK scenario
Host B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
Fast Retransmit
bull Time-out period often relatively longndash Long delay before
resending lost packet
bull Detect lost segments via duplicate ACKsndash Sender often sends
many segments back-to-back
ndash If segment lost there will likely be many duplicate ACKs for that segment
bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
Host A
timeo
ut
Host B
time
X
resend seq X2
seq x1seq x2seq x3seq x4seq x5
ACK x1
ACK x1ACK x1ACK x1
tripleduplicate
ACKs
Fast
Retra
nsm
it
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Flow Control
bull Receive side of TCP connection has a receive buffer
bull speed-matching service matching send rate to receiving applicationrsquos drain rate
bull App process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast
flow control
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
TCP Flow Control How it Works
(suppose TCP receiver discards out-of-order segments)
bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Receiver advertises unused buffer space by including rwnd value in segment header
bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos
buffer doesnrsquot overflow
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
rwndRcvBuffer
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control
info (eg RcvWindow)bull client connection
initiator Socket clientSocket = new
Socket(ldquohostnamerdquo port) bull server contacted by
client Socket connectionSocket =
welcomeSocketaccept()
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial
seq Step 3 client receives
SYNACK replies with ACK segment which may contain data
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer 3-81
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Principles of Congestion Control
Congestionbull Informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull Different from flow controlbull Manifestations
ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)
bull A ldquotop-10rdquo problem
Causescosts of Congestion Scenario 1
bull Two senders two receivers
bull One router infinite buffers
bull No retransmission
bull Large delays when congested
bull Maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
Causescosts of Congestion Scenario 2
bull One router finite buffers bull Sender retransmission of lost packet
finite shared output link
buffers
Host A
lin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of congestion Scenario 2
bull Always (goodput)
bull ldquoPerfectrdquo retransmission only when loss
bull Retransmission of delayed (not lost) packet makes
larger (than perfect case) for same
lin
lout
=
lin
lout
gtl
inl
out
ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of
pkt
R2
R2lin
l ou
t
b
R2
R2lin
l ou
t
a
R2
R2lin
l ou
t
c
R4
R3
Causescosts of Congestion Scenario 3
bull Four sendersbull Multihop pathsbull Timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of Congestion Scenario 3
Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Host A
Host B
lo
u
t
Approaches towards congestion control
End-end congestion control
bull No explicit feedback from network
bull Congestion inferred from end-system observed loss delay
bull Approach taken by TCP
Network-assisted congestion control
bull Routers provide feedback to end systemsndash Single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash Explicit rate sender should send at
Broadly
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Congestion Control
bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level
bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)
network not congested so increase sending rate
- lost segment assume loss due to congested network so decrease sending rate
TCP Congestion Control Bandwidth Probing
bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since
available bandwidth is changing depending on other connections in network)
ACKs being received so increase rate
X
X
XX
X loss so decrease rate
send
ing
rate
time
bull Q how fast to increasedecrease- details to follow
TCPrsquosldquosawtoothrdquobehavior
TCP Congestion Control detailsbull sender limits rate by limiting
number of unACKed bytes ldquoin pipelinerdquo
ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)
bull roughly
bull cwnd is dynamic function of perceived network congestion
rate = cwnd
RTT bytessec
LastByteSent-LastByteAcked cwnd
cwndbytes
RTT
ACK(s)
TCP Congestion Control more details
segment loss event reducing cwnd
bull timeout no response from receiverndash cut cwnd to 1
bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less
aggressively than on timeout
ACK received increase cwnd
bull slowstart phase - increase exponentially
fast (despite name) at connection start or following timeout
bull congestion avoidance - increase linearly
TCP Slow Start
bull when connection begins cwnd = 1 MSSndash example MSS = 500
bytes amp RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp
up to respectable ratebull increase rate exponentially
until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transitioning intoout of slowstart
ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2
ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to
congestion avoidance phase
slow start timeout
ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed
new ACKdupACKcount++
duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion
avoidance
TCP Congestion Avoidance
bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1
MSS per RTT ndash approach possible
congestion slower than in slowstart
ndash implementation cwnd = cwnd + MSScwnd for each ACK received
bull ACKs increase cwnd by 1 MSS per RTT additive increase
bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease
AIMD
AIMD Additive IncreaseMultiplicative Decrease
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
GBN Receiver Extended FSM
ACK-only always send ACK for correctly-received pkt with highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum
bull out-of-order pkt ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK pkt with highest in-order seq
Wait
udt_send(sndpkt)
default
rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++
expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)
L
GBN inaction
GBN Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsgo-back-nindexhtml
Selective Repeatbull Receiver individually acknowledges all
correctly received pktsndash Buffers pkts as needed for eventual in-order
delivery to upper layer
bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt
bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts
Selective Repeat sender receiver windows
Selective Repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next
unACKed seq
senderpkt n in [rcvbase rcvbase+N-
1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also
deliver buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1]
bull ACK(n)
otherwise bull ignore
receiver
Transport Layer 3-52
Selective Repeat in Action
Selective Repeat
DilemmaExample bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
SR Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Overview RFCs 793 1122 1323 2018 2581
bull full duplex datandash bi-directional data flow
in same connectionndash MSS maximum
segment size
bull connection-oriented ndash handshaking
(exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not
overwhelm receiver
bull point-to-pointndash one sender one
receiver
bull reliable in-order byte steamndash no ldquomessage
boundariesrdquo
bull pipelinedndash TCP congestion and
flow control set window size
bull send amp receive buffers
socketdoor
T C Psend buffer
T C Preceive buffer
socketdoor
segm ent
applicationwrites data
applicationreads data
TCP Segment Structure
source port dest port
32 bits
applicationdata (variable length)
sequence number
acknowledgement number
Receive window
Urg data pointerchecksumFSRPAU
headlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
TCP Seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next
byte expected from other side
ndash cumulative ACKQ how receiver
handles out-of-order segmentsndash A TCP spec
doesnrsquot say up to implementer
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
UsertypeslsquoCrsquo
host ACKsreceipt of echoedlsquoCrsquo
timesimple telnet scenario
host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTT
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTTbull SampleRTT measured time
from segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
bull Exponential weighted moving averagebull influence of past sample decreases exponentially
fastbull typical value = 18th (or 0125)
Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety margin
bull First estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP reliable data transfer
bull TCP creates rdt service on top of IPrsquos unreliable service
bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs
bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control
TCP Sender EventsData rcvd from appbull Create segment with
seq bull seq is byte-stream
number of first data byte in segment
bull Start timer if not already running (think of timer as for oldest unACKed segment)
bull Expiration interval TimeOutInterval
Timeoutbull retransmit segment
that caused timeoutbull restart timer ACK rcvdbull If acknowledges
previously unACKed segmentsndash update what is known
to be ACKedndash start timer if there are
outstanding segments
TCP Sender
(simplified)
NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71
y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed
TCP Retransmission Scenarios
Host A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq
=92
tim
eout
ACK=120
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq
=92
tim
eout
SendBase= 100
SendBase= 120
SendBase= 120
Sendbase= 100
TCP Retransmission Scenarios (more)
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
Cumulative ACK scenario
Host B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
Fast Retransmit
bull Time-out period often relatively longndash Long delay before
resending lost packet
bull Detect lost segments via duplicate ACKsndash Sender often sends
many segments back-to-back
ndash If segment lost there will likely be many duplicate ACKs for that segment
bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
Host A
timeo
ut
Host B
time
X
resend seq X2
seq x1seq x2seq x3seq x4seq x5
ACK x1
ACK x1ACK x1ACK x1
tripleduplicate
ACKs
Fast
Retra
nsm
it
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Flow Control
bull Receive side of TCP connection has a receive buffer
bull speed-matching service matching send rate to receiving applicationrsquos drain rate
bull App process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast
flow control
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
TCP Flow Control How it Works
(suppose TCP receiver discards out-of-order segments)
bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Receiver advertises unused buffer space by including rwnd value in segment header
bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos
buffer doesnrsquot overflow
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
rwndRcvBuffer
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control
info (eg RcvWindow)bull client connection
initiator Socket clientSocket = new
Socket(ldquohostnamerdquo port) bull server contacted by
client Socket connectionSocket =
welcomeSocketaccept()
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial
seq Step 3 client receives
SYNACK replies with ACK segment which may contain data
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer 3-81
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Principles of Congestion Control
Congestionbull Informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull Different from flow controlbull Manifestations
ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)
bull A ldquotop-10rdquo problem
Causescosts of Congestion Scenario 1
bull Two senders two receivers
bull One router infinite buffers
bull No retransmission
bull Large delays when congested
bull Maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
Causescosts of Congestion Scenario 2
bull One router finite buffers bull Sender retransmission of lost packet
finite shared output link
buffers
Host A
lin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of congestion Scenario 2
bull Always (goodput)
bull ldquoPerfectrdquo retransmission only when loss
bull Retransmission of delayed (not lost) packet makes
larger (than perfect case) for same
lin
lout
=
lin
lout
gtl
inl
out
ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of
pkt
R2
R2lin
l ou
t
b
R2
R2lin
l ou
t
a
R2
R2lin
l ou
t
c
R4
R3
Causescosts of Congestion Scenario 3
bull Four sendersbull Multihop pathsbull Timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of Congestion Scenario 3
Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Host A
Host B
lo
u
t
Approaches towards congestion control
End-end congestion control
bull No explicit feedback from network
bull Congestion inferred from end-system observed loss delay
bull Approach taken by TCP
Network-assisted congestion control
bull Routers provide feedback to end systemsndash Single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash Explicit rate sender should send at
Broadly
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Congestion Control
bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level
bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)
network not congested so increase sending rate
- lost segment assume loss due to congested network so decrease sending rate
TCP Congestion Control Bandwidth Probing
bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since
available bandwidth is changing depending on other connections in network)
ACKs being received so increase rate
X
X
XX
X loss so decrease rate
send
ing
rate
time
bull Q how fast to increasedecrease- details to follow
TCPrsquosldquosawtoothrdquobehavior
TCP Congestion Control detailsbull sender limits rate by limiting
number of unACKed bytes ldquoin pipelinerdquo
ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)
bull roughly
bull cwnd is dynamic function of perceived network congestion
rate = cwnd
RTT bytessec
LastByteSent-LastByteAcked cwnd
cwndbytes
RTT
ACK(s)
TCP Congestion Control more details
segment loss event reducing cwnd
bull timeout no response from receiverndash cut cwnd to 1
bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less
aggressively than on timeout
ACK received increase cwnd
bull slowstart phase - increase exponentially
fast (despite name) at connection start or following timeout
bull congestion avoidance - increase linearly
TCP Slow Start
bull when connection begins cwnd = 1 MSSndash example MSS = 500
bytes amp RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp
up to respectable ratebull increase rate exponentially
until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transitioning intoout of slowstart
ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2
ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to
congestion avoidance phase
slow start timeout
ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed
new ACKdupACKcount++
duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion
avoidance
TCP Congestion Avoidance
bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1
MSS per RTT ndash approach possible
congestion slower than in slowstart
ndash implementation cwnd = cwnd + MSScwnd for each ACK received
bull ACKs increase cwnd by 1 MSS per RTT additive increase
bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease
AIMD
AIMD Additive IncreaseMultiplicative Decrease
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
GBN inaction
GBN Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsgo-back-nindexhtml
Selective Repeatbull Receiver individually acknowledges all
correctly received pktsndash Buffers pkts as needed for eventual in-order
delivery to upper layer
bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt
bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts
Selective Repeat sender receiver windows
Selective Repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next
unACKed seq
senderpkt n in [rcvbase rcvbase+N-
1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also
deliver buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1]
bull ACK(n)
otherwise bull ignore
receiver
Transport Layer 3-52
Selective Repeat in Action
Selective Repeat
DilemmaExample bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
SR Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Overview RFCs 793 1122 1323 2018 2581
bull full duplex datandash bi-directional data flow
in same connectionndash MSS maximum
segment size
bull connection-oriented ndash handshaking
(exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not
overwhelm receiver
bull point-to-pointndash one sender one
receiver
bull reliable in-order byte steamndash no ldquomessage
boundariesrdquo
bull pipelinedndash TCP congestion and
flow control set window size
bull send amp receive buffers
socketdoor
T C Psend buffer
T C Preceive buffer
socketdoor
segm ent
applicationwrites data
applicationreads data
TCP Segment Structure
source port dest port
32 bits
applicationdata (variable length)
sequence number
acknowledgement number
Receive window
Urg data pointerchecksumFSRPAU
headlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
TCP Seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next
byte expected from other side
ndash cumulative ACKQ how receiver
handles out-of-order segmentsndash A TCP spec
doesnrsquot say up to implementer
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
UsertypeslsquoCrsquo
host ACKsreceipt of echoedlsquoCrsquo
timesimple telnet scenario
host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTT
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTTbull SampleRTT measured time
from segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
bull Exponential weighted moving averagebull influence of past sample decreases exponentially
fastbull typical value = 18th (or 0125)
Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety margin
bull First estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP reliable data transfer
bull TCP creates rdt service on top of IPrsquos unreliable service
bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs
bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control
TCP Sender EventsData rcvd from appbull Create segment with
seq bull seq is byte-stream
number of first data byte in segment
bull Start timer if not already running (think of timer as for oldest unACKed segment)
bull Expiration interval TimeOutInterval
Timeoutbull retransmit segment
that caused timeoutbull restart timer ACK rcvdbull If acknowledges
previously unACKed segmentsndash update what is known
to be ACKedndash start timer if there are
outstanding segments
TCP Sender
(simplified)
NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71
y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed
TCP Retransmission Scenarios
Host A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq
=92
tim
eout
ACK=120
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq
=92
tim
eout
SendBase= 100
SendBase= 120
SendBase= 120
Sendbase= 100
TCP Retransmission Scenarios (more)
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
Cumulative ACK scenario
Host B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
Fast Retransmit
bull Time-out period often relatively longndash Long delay before
resending lost packet
bull Detect lost segments via duplicate ACKsndash Sender often sends
many segments back-to-back
ndash If segment lost there will likely be many duplicate ACKs for that segment
bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
Host A
timeo
ut
Host B
time
X
resend seq X2
seq x1seq x2seq x3seq x4seq x5
ACK x1
ACK x1ACK x1ACK x1
tripleduplicate
ACKs
Fast
Retra
nsm
it
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Flow Control
bull Receive side of TCP connection has a receive buffer
bull speed-matching service matching send rate to receiving applicationrsquos drain rate
bull App process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast
flow control
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
TCP Flow Control How it Works
(suppose TCP receiver discards out-of-order segments)
bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Receiver advertises unused buffer space by including rwnd value in segment header
bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos
buffer doesnrsquot overflow
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
rwndRcvBuffer
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control
info (eg RcvWindow)bull client connection
initiator Socket clientSocket = new
Socket(ldquohostnamerdquo port) bull server contacted by
client Socket connectionSocket =
welcomeSocketaccept()
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial
seq Step 3 client receives
SYNACK replies with ACK segment which may contain data
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer 3-81
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Principles of Congestion Control
Congestionbull Informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull Different from flow controlbull Manifestations
ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)
bull A ldquotop-10rdquo problem
Causescosts of Congestion Scenario 1
bull Two senders two receivers
bull One router infinite buffers
bull No retransmission
bull Large delays when congested
bull Maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
Causescosts of Congestion Scenario 2
bull One router finite buffers bull Sender retransmission of lost packet
finite shared output link
buffers
Host A
lin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of congestion Scenario 2
bull Always (goodput)
bull ldquoPerfectrdquo retransmission only when loss
bull Retransmission of delayed (not lost) packet makes
larger (than perfect case) for same
lin
lout
=
lin
lout
gtl
inl
out
ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of
pkt
R2
R2lin
l ou
t
b
R2
R2lin
l ou
t
a
R2
R2lin
l ou
t
c
R4
R3
Causescosts of Congestion Scenario 3
bull Four sendersbull Multihop pathsbull Timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of Congestion Scenario 3
Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Host A
Host B
lo
u
t
Approaches towards congestion control
End-end congestion control
bull No explicit feedback from network
bull Congestion inferred from end-system observed loss delay
bull Approach taken by TCP
Network-assisted congestion control
bull Routers provide feedback to end systemsndash Single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash Explicit rate sender should send at
Broadly
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Congestion Control
bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level
bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)
network not congested so increase sending rate
- lost segment assume loss due to congested network so decrease sending rate
TCP Congestion Control Bandwidth Probing
bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since
available bandwidth is changing depending on other connections in network)
ACKs being received so increase rate
X
X
XX
X loss so decrease rate
send
ing
rate
time
bull Q how fast to increasedecrease- details to follow
TCPrsquosldquosawtoothrdquobehavior
TCP Congestion Control detailsbull sender limits rate by limiting
number of unACKed bytes ldquoin pipelinerdquo
ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)
bull roughly
bull cwnd is dynamic function of perceived network congestion
rate = cwnd
RTT bytessec
LastByteSent-LastByteAcked cwnd
cwndbytes
RTT
ACK(s)
TCP Congestion Control more details
segment loss event reducing cwnd
bull timeout no response from receiverndash cut cwnd to 1
bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less
aggressively than on timeout
ACK received increase cwnd
bull slowstart phase - increase exponentially
fast (despite name) at connection start or following timeout
bull congestion avoidance - increase linearly
TCP Slow Start
bull when connection begins cwnd = 1 MSSndash example MSS = 500
bytes amp RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp
up to respectable ratebull increase rate exponentially
until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transitioning intoout of slowstart
ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2
ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to
congestion avoidance phase
slow start timeout
ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed
new ACKdupACKcount++
duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion
avoidance
TCP Congestion Avoidance
bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1
MSS per RTT ndash approach possible
congestion slower than in slowstart
ndash implementation cwnd = cwnd + MSScwnd for each ACK received
bull ACKs increase cwnd by 1 MSS per RTT additive increase
bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease
AIMD
AIMD Additive IncreaseMultiplicative Decrease
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
GBN Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsgo-back-nindexhtml
Selective Repeatbull Receiver individually acknowledges all
correctly received pktsndash Buffers pkts as needed for eventual in-order
delivery to upper layer
bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt
bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts
Selective Repeat sender receiver windows
Selective Repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next
unACKed seq
senderpkt n in [rcvbase rcvbase+N-
1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also
deliver buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1]
bull ACK(n)
otherwise bull ignore
receiver
Transport Layer 3-52
Selective Repeat in Action
Selective Repeat
DilemmaExample bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
SR Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Overview RFCs 793 1122 1323 2018 2581
bull full duplex datandash bi-directional data flow
in same connectionndash MSS maximum
segment size
bull connection-oriented ndash handshaking
(exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not
overwhelm receiver
bull point-to-pointndash one sender one
receiver
bull reliable in-order byte steamndash no ldquomessage
boundariesrdquo
bull pipelinedndash TCP congestion and
flow control set window size
bull send amp receive buffers
socketdoor
T C Psend buffer
T C Preceive buffer
socketdoor
segm ent
applicationwrites data
applicationreads data
TCP Segment Structure
source port dest port
32 bits
applicationdata (variable length)
sequence number
acknowledgement number
Receive window
Urg data pointerchecksumFSRPAU
headlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
TCP Seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next
byte expected from other side
ndash cumulative ACKQ how receiver
handles out-of-order segmentsndash A TCP spec
doesnrsquot say up to implementer
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
UsertypeslsquoCrsquo
host ACKsreceipt of echoedlsquoCrsquo
timesimple telnet scenario
host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTT
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTTbull SampleRTT measured time
from segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
bull Exponential weighted moving averagebull influence of past sample decreases exponentially
fastbull typical value = 18th (or 0125)
Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety margin
bull First estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP reliable data transfer
bull TCP creates rdt service on top of IPrsquos unreliable service
bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs
bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control
TCP Sender EventsData rcvd from appbull Create segment with
seq bull seq is byte-stream
number of first data byte in segment
bull Start timer if not already running (think of timer as for oldest unACKed segment)
bull Expiration interval TimeOutInterval
Timeoutbull retransmit segment
that caused timeoutbull restart timer ACK rcvdbull If acknowledges
previously unACKed segmentsndash update what is known
to be ACKedndash start timer if there are
outstanding segments
TCP Sender
(simplified)
NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71
y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed
TCP Retransmission Scenarios
Host A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq
=92
tim
eout
ACK=120
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq
=92
tim
eout
SendBase= 100
SendBase= 120
SendBase= 120
Sendbase= 100
TCP Retransmission Scenarios (more)
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
Cumulative ACK scenario
Host B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
Fast Retransmit
bull Time-out period often relatively longndash Long delay before
resending lost packet
bull Detect lost segments via duplicate ACKsndash Sender often sends
many segments back-to-back
ndash If segment lost there will likely be many duplicate ACKs for that segment
bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
Host A
timeo
ut
Host B
time
X
resend seq X2
seq x1seq x2seq x3seq x4seq x5
ACK x1
ACK x1ACK x1ACK x1
tripleduplicate
ACKs
Fast
Retra
nsm
it
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Flow Control
bull Receive side of TCP connection has a receive buffer
bull speed-matching service matching send rate to receiving applicationrsquos drain rate
bull App process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast
flow control
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
TCP Flow Control How it Works
(suppose TCP receiver discards out-of-order segments)
bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Receiver advertises unused buffer space by including rwnd value in segment header
bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos
buffer doesnrsquot overflow
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
rwndRcvBuffer
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control
info (eg RcvWindow)bull client connection
initiator Socket clientSocket = new
Socket(ldquohostnamerdquo port) bull server contacted by
client Socket connectionSocket =
welcomeSocketaccept()
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial
seq Step 3 client receives
SYNACK replies with ACK segment which may contain data
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer 3-81
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Principles of Congestion Control
Congestionbull Informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull Different from flow controlbull Manifestations
ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)
bull A ldquotop-10rdquo problem
Causescosts of Congestion Scenario 1
bull Two senders two receivers
bull One router infinite buffers
bull No retransmission
bull Large delays when congested
bull Maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
Causescosts of Congestion Scenario 2
bull One router finite buffers bull Sender retransmission of lost packet
finite shared output link
buffers
Host A
lin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of congestion Scenario 2
bull Always (goodput)
bull ldquoPerfectrdquo retransmission only when loss
bull Retransmission of delayed (not lost) packet makes
larger (than perfect case) for same
lin
lout
=
lin
lout
gtl
inl
out
ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of
pkt
R2
R2lin
l ou
t
b
R2
R2lin
l ou
t
a
R2
R2lin
l ou
t
c
R4
R3
Causescosts of Congestion Scenario 3
bull Four sendersbull Multihop pathsbull Timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of Congestion Scenario 3
Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Host A
Host B
lo
u
t
Approaches towards congestion control
End-end congestion control
bull No explicit feedback from network
bull Congestion inferred from end-system observed loss delay
bull Approach taken by TCP
Network-assisted congestion control
bull Routers provide feedback to end systemsndash Single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash Explicit rate sender should send at
Broadly
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Congestion Control
bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level
bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)
network not congested so increase sending rate
- lost segment assume loss due to congested network so decrease sending rate
TCP Congestion Control Bandwidth Probing
bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since
available bandwidth is changing depending on other connections in network)
ACKs being received so increase rate
X
X
XX
X loss so decrease rate
send
ing
rate
time
bull Q how fast to increasedecrease- details to follow
TCPrsquosldquosawtoothrdquobehavior
TCP Congestion Control detailsbull sender limits rate by limiting
number of unACKed bytes ldquoin pipelinerdquo
ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)
bull roughly
bull cwnd is dynamic function of perceived network congestion
rate = cwnd
RTT bytessec
LastByteSent-LastByteAcked cwnd
cwndbytes
RTT
ACK(s)
TCP Congestion Control more details
segment loss event reducing cwnd
bull timeout no response from receiverndash cut cwnd to 1
bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less
aggressively than on timeout
ACK received increase cwnd
bull slowstart phase - increase exponentially
fast (despite name) at connection start or following timeout
bull congestion avoidance - increase linearly
TCP Slow Start
bull when connection begins cwnd = 1 MSSndash example MSS = 500
bytes amp RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp
up to respectable ratebull increase rate exponentially
until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transitioning intoout of slowstart
ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2
ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to
congestion avoidance phase
slow start timeout
ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed
new ACKdupACKcount++
duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion
avoidance
TCP Congestion Avoidance
bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1
MSS per RTT ndash approach possible
congestion slower than in slowstart
ndash implementation cwnd = cwnd + MSScwnd for each ACK received
bull ACKs increase cwnd by 1 MSS per RTT additive increase
bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease
AIMD
AIMD Additive IncreaseMultiplicative Decrease
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
Selective Repeatbull Receiver individually acknowledges all
correctly received pktsndash Buffers pkts as needed for eventual in-order
delivery to upper layer
bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt
bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts
Selective Repeat sender receiver windows
Selective Repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next
unACKed seq
senderpkt n in [rcvbase rcvbase+N-
1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also
deliver buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1]
bull ACK(n)
otherwise bull ignore
receiver
Transport Layer 3-52
Selective Repeat in Action
Selective Repeat
DilemmaExample bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
SR Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Overview RFCs 793 1122 1323 2018 2581
bull full duplex datandash bi-directional data flow
in same connectionndash MSS maximum
segment size
bull connection-oriented ndash handshaking
(exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not
overwhelm receiver
bull point-to-pointndash one sender one
receiver
bull reliable in-order byte steamndash no ldquomessage
boundariesrdquo
bull pipelinedndash TCP congestion and
flow control set window size
bull send amp receive buffers
socketdoor
T C Psend buffer
T C Preceive buffer
socketdoor
segm ent
applicationwrites data
applicationreads data
TCP Segment Structure
source port dest port
32 bits
applicationdata (variable length)
sequence number
acknowledgement number
Receive window
Urg data pointerchecksumFSRPAU
headlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
TCP Seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next
byte expected from other side
ndash cumulative ACKQ how receiver
handles out-of-order segmentsndash A TCP spec
doesnrsquot say up to implementer
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
UsertypeslsquoCrsquo
host ACKsreceipt of echoedlsquoCrsquo
timesimple telnet scenario
host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTT
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTTbull SampleRTT measured time
from segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
bull Exponential weighted moving averagebull influence of past sample decreases exponentially
fastbull typical value = 18th (or 0125)
Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety margin
bull First estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP reliable data transfer
bull TCP creates rdt service on top of IPrsquos unreliable service
bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs
bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control
TCP Sender EventsData rcvd from appbull Create segment with
seq bull seq is byte-stream
number of first data byte in segment
bull Start timer if not already running (think of timer as for oldest unACKed segment)
bull Expiration interval TimeOutInterval
Timeoutbull retransmit segment
that caused timeoutbull restart timer ACK rcvdbull If acknowledges
previously unACKed segmentsndash update what is known
to be ACKedndash start timer if there are
outstanding segments
TCP Sender
(simplified)
NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71
y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed
TCP Retransmission Scenarios
Host A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq
=92
tim
eout
ACK=120
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq
=92
tim
eout
SendBase= 100
SendBase= 120
SendBase= 120
Sendbase= 100
TCP Retransmission Scenarios (more)
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
Cumulative ACK scenario
Host B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
Fast Retransmit
bull Time-out period often relatively longndash Long delay before
resending lost packet
bull Detect lost segments via duplicate ACKsndash Sender often sends
many segments back-to-back
ndash If segment lost there will likely be many duplicate ACKs for that segment
bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
Host A
timeo
ut
Host B
time
X
resend seq X2
seq x1seq x2seq x3seq x4seq x5
ACK x1
ACK x1ACK x1ACK x1
tripleduplicate
ACKs
Fast
Retra
nsm
it
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Flow Control
bull Receive side of TCP connection has a receive buffer
bull speed-matching service matching send rate to receiving applicationrsquos drain rate
bull App process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast
flow control
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
TCP Flow Control How it Works
(suppose TCP receiver discards out-of-order segments)
bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Receiver advertises unused buffer space by including rwnd value in segment header
bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos
buffer doesnrsquot overflow
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
rwndRcvBuffer
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control
info (eg RcvWindow)bull client connection
initiator Socket clientSocket = new
Socket(ldquohostnamerdquo port) bull server contacted by
client Socket connectionSocket =
welcomeSocketaccept()
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial
seq Step 3 client receives
SYNACK replies with ACK segment which may contain data
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer 3-81
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Principles of Congestion Control
Congestionbull Informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull Different from flow controlbull Manifestations
ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)
bull A ldquotop-10rdquo problem
Causescosts of Congestion Scenario 1
bull Two senders two receivers
bull One router infinite buffers
bull No retransmission
bull Large delays when congested
bull Maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
Causescosts of Congestion Scenario 2
bull One router finite buffers bull Sender retransmission of lost packet
finite shared output link
buffers
Host A
lin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of congestion Scenario 2
bull Always (goodput)
bull ldquoPerfectrdquo retransmission only when loss
bull Retransmission of delayed (not lost) packet makes
larger (than perfect case) for same
lin
lout
=
lin
lout
gtl
inl
out
ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of
pkt
R2
R2lin
l ou
t
b
R2
R2lin
l ou
t
a
R2
R2lin
l ou
t
c
R4
R3
Causescosts of Congestion Scenario 3
bull Four sendersbull Multihop pathsbull Timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of Congestion Scenario 3
Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Host A
Host B
lo
u
t
Approaches towards congestion control
End-end congestion control
bull No explicit feedback from network
bull Congestion inferred from end-system observed loss delay
bull Approach taken by TCP
Network-assisted congestion control
bull Routers provide feedback to end systemsndash Single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash Explicit rate sender should send at
Broadly
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Congestion Control
bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level
bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)
network not congested so increase sending rate
- lost segment assume loss due to congested network so decrease sending rate
TCP Congestion Control Bandwidth Probing
bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since
available bandwidth is changing depending on other connections in network)
ACKs being received so increase rate
X
X
XX
X loss so decrease rate
send
ing
rate
time
bull Q how fast to increasedecrease- details to follow
TCPrsquosldquosawtoothrdquobehavior
TCP Congestion Control detailsbull sender limits rate by limiting
number of unACKed bytes ldquoin pipelinerdquo
ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)
bull roughly
bull cwnd is dynamic function of perceived network congestion
rate = cwnd
RTT bytessec
LastByteSent-LastByteAcked cwnd
cwndbytes
RTT
ACK(s)
TCP Congestion Control more details
segment loss event reducing cwnd
bull timeout no response from receiverndash cut cwnd to 1
bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less
aggressively than on timeout
ACK received increase cwnd
bull slowstart phase - increase exponentially
fast (despite name) at connection start or following timeout
bull congestion avoidance - increase linearly
TCP Slow Start
bull when connection begins cwnd = 1 MSSndash example MSS = 500
bytes amp RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp
up to respectable ratebull increase rate exponentially
until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transitioning intoout of slowstart
ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2
ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to
congestion avoidance phase
slow start timeout
ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed
new ACKdupACKcount++
duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion
avoidance
TCP Congestion Avoidance
bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1
MSS per RTT ndash approach possible
congestion slower than in slowstart
ndash implementation cwnd = cwnd + MSScwnd for each ACK received
bull ACKs increase cwnd by 1 MSS per RTT additive increase
bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease
AIMD
AIMD Additive IncreaseMultiplicative Decrease
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
Selective Repeat sender receiver windows
Selective Repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next
unACKed seq
senderpkt n in [rcvbase rcvbase+N-
1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also
deliver buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1]
bull ACK(n)
otherwise bull ignore
receiver
Transport Layer 3-52
Selective Repeat in Action
Selective Repeat
DilemmaExample bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
SR Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Overview RFCs 793 1122 1323 2018 2581
bull full duplex datandash bi-directional data flow
in same connectionndash MSS maximum
segment size
bull connection-oriented ndash handshaking
(exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not
overwhelm receiver
bull point-to-pointndash one sender one
receiver
bull reliable in-order byte steamndash no ldquomessage
boundariesrdquo
bull pipelinedndash TCP congestion and
flow control set window size
bull send amp receive buffers
socketdoor
T C Psend buffer
T C Preceive buffer
socketdoor
segm ent
applicationwrites data
applicationreads data
TCP Segment Structure
source port dest port
32 bits
applicationdata (variable length)
sequence number
acknowledgement number
Receive window
Urg data pointerchecksumFSRPAU
headlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
TCP Seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next
byte expected from other side
ndash cumulative ACKQ how receiver
handles out-of-order segmentsndash A TCP spec
doesnrsquot say up to implementer
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
UsertypeslsquoCrsquo
host ACKsreceipt of echoedlsquoCrsquo
timesimple telnet scenario
host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTT
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTTbull SampleRTT measured time
from segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
bull Exponential weighted moving averagebull influence of past sample decreases exponentially
fastbull typical value = 18th (or 0125)
Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety margin
bull First estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP reliable data transfer
bull TCP creates rdt service on top of IPrsquos unreliable service
bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs
bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control
TCP Sender EventsData rcvd from appbull Create segment with
seq bull seq is byte-stream
number of first data byte in segment
bull Start timer if not already running (think of timer as for oldest unACKed segment)
bull Expiration interval TimeOutInterval
Timeoutbull retransmit segment
that caused timeoutbull restart timer ACK rcvdbull If acknowledges
previously unACKed segmentsndash update what is known
to be ACKedndash start timer if there are
outstanding segments
TCP Sender
(simplified)
NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71
y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed
TCP Retransmission Scenarios
Host A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq
=92
tim
eout
ACK=120
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq
=92
tim
eout
SendBase= 100
SendBase= 120
SendBase= 120
Sendbase= 100
TCP Retransmission Scenarios (more)
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
Cumulative ACK scenario
Host B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
Fast Retransmit
bull Time-out period often relatively longndash Long delay before
resending lost packet
bull Detect lost segments via duplicate ACKsndash Sender often sends
many segments back-to-back
ndash If segment lost there will likely be many duplicate ACKs for that segment
bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
Host A
timeo
ut
Host B
time
X
resend seq X2
seq x1seq x2seq x3seq x4seq x5
ACK x1
ACK x1ACK x1ACK x1
tripleduplicate
ACKs
Fast
Retra
nsm
it
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Flow Control
bull Receive side of TCP connection has a receive buffer
bull speed-matching service matching send rate to receiving applicationrsquos drain rate
bull App process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast
flow control
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
TCP Flow Control How it Works
(suppose TCP receiver discards out-of-order segments)
bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Receiver advertises unused buffer space by including rwnd value in segment header
bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos
buffer doesnrsquot overflow
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
rwndRcvBuffer
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control
info (eg RcvWindow)bull client connection
initiator Socket clientSocket = new
Socket(ldquohostnamerdquo port) bull server contacted by
client Socket connectionSocket =
welcomeSocketaccept()
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial
seq Step 3 client receives
SYNACK replies with ACK segment which may contain data
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer 3-81
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Principles of Congestion Control
Congestionbull Informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull Different from flow controlbull Manifestations
ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)
bull A ldquotop-10rdquo problem
Causescosts of Congestion Scenario 1
bull Two senders two receivers
bull One router infinite buffers
bull No retransmission
bull Large delays when congested
bull Maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
Causescosts of Congestion Scenario 2
bull One router finite buffers bull Sender retransmission of lost packet
finite shared output link
buffers
Host A
lin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of congestion Scenario 2
bull Always (goodput)
bull ldquoPerfectrdquo retransmission only when loss
bull Retransmission of delayed (not lost) packet makes
larger (than perfect case) for same
lin
lout
=
lin
lout
gtl
inl
out
ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of
pkt
R2
R2lin
l ou
t
b
R2
R2lin
l ou
t
a
R2
R2lin
l ou
t
c
R4
R3
Causescosts of Congestion Scenario 3
bull Four sendersbull Multihop pathsbull Timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of Congestion Scenario 3
Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Host A
Host B
lo
u
t
Approaches towards congestion control
End-end congestion control
bull No explicit feedback from network
bull Congestion inferred from end-system observed loss delay
bull Approach taken by TCP
Network-assisted congestion control
bull Routers provide feedback to end systemsndash Single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash Explicit rate sender should send at
Broadly
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Congestion Control
bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level
bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)
network not congested so increase sending rate
- lost segment assume loss due to congested network so decrease sending rate
TCP Congestion Control Bandwidth Probing
bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since
available bandwidth is changing depending on other connections in network)
ACKs being received so increase rate
X
X
XX
X loss so decrease rate
send
ing
rate
time
bull Q how fast to increasedecrease- details to follow
TCPrsquosldquosawtoothrdquobehavior
TCP Congestion Control detailsbull sender limits rate by limiting
number of unACKed bytes ldquoin pipelinerdquo
ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)
bull roughly
bull cwnd is dynamic function of perceived network congestion
rate = cwnd
RTT bytessec
LastByteSent-LastByteAcked cwnd
cwndbytes
RTT
ACK(s)
TCP Congestion Control more details
segment loss event reducing cwnd
bull timeout no response from receiverndash cut cwnd to 1
bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less
aggressively than on timeout
ACK received increase cwnd
bull slowstart phase - increase exponentially
fast (despite name) at connection start or following timeout
bull congestion avoidance - increase linearly
TCP Slow Start
bull when connection begins cwnd = 1 MSSndash example MSS = 500
bytes amp RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp
up to respectable ratebull increase rate exponentially
until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transitioning intoout of slowstart
ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2
ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to
congestion avoidance phase
slow start timeout
ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed
new ACKdupACKcount++
duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion
avoidance
TCP Congestion Avoidance
bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1
MSS per RTT ndash approach possible
congestion slower than in slowstart
ndash implementation cwnd = cwnd + MSScwnd for each ACK received
bull ACKs increase cwnd by 1 MSS per RTT additive increase
bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease
AIMD
AIMD Additive IncreaseMultiplicative Decrease
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
Selective Repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next
unACKed seq
senderpkt n in [rcvbase rcvbase+N-
1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also
deliver buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1]
bull ACK(n)
otherwise bull ignore
receiver
Transport Layer 3-52
Selective Repeat in Action
Selective Repeat
DilemmaExample bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
SR Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Overview RFCs 793 1122 1323 2018 2581
bull full duplex datandash bi-directional data flow
in same connectionndash MSS maximum
segment size
bull connection-oriented ndash handshaking
(exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not
overwhelm receiver
bull point-to-pointndash one sender one
receiver
bull reliable in-order byte steamndash no ldquomessage
boundariesrdquo
bull pipelinedndash TCP congestion and
flow control set window size
bull send amp receive buffers
socketdoor
T C Psend buffer
T C Preceive buffer
socketdoor
segm ent
applicationwrites data
applicationreads data
TCP Segment Structure
source port dest port
32 bits
applicationdata (variable length)
sequence number
acknowledgement number
Receive window
Urg data pointerchecksumFSRPAU
headlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
TCP Seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next
byte expected from other side
ndash cumulative ACKQ how receiver
handles out-of-order segmentsndash A TCP spec
doesnrsquot say up to implementer
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
UsertypeslsquoCrsquo
host ACKsreceipt of echoedlsquoCrsquo
timesimple telnet scenario
host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTT
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTTbull SampleRTT measured time
from segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
bull Exponential weighted moving averagebull influence of past sample decreases exponentially
fastbull typical value = 18th (or 0125)
Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety margin
bull First estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP reliable data transfer
bull TCP creates rdt service on top of IPrsquos unreliable service
bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs
bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control
TCP Sender EventsData rcvd from appbull Create segment with
seq bull seq is byte-stream
number of first data byte in segment
bull Start timer if not already running (think of timer as for oldest unACKed segment)
bull Expiration interval TimeOutInterval
Timeoutbull retransmit segment
that caused timeoutbull restart timer ACK rcvdbull If acknowledges
previously unACKed segmentsndash update what is known
to be ACKedndash start timer if there are
outstanding segments
TCP Sender
(simplified)
NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71
y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed
TCP Retransmission Scenarios
Host A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq
=92
tim
eout
ACK=120
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq
=92
tim
eout
SendBase= 100
SendBase= 120
SendBase= 120
Sendbase= 100
TCP Retransmission Scenarios (more)
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
Cumulative ACK scenario
Host B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
Fast Retransmit
bull Time-out period often relatively longndash Long delay before
resending lost packet
bull Detect lost segments via duplicate ACKsndash Sender often sends
many segments back-to-back
ndash If segment lost there will likely be many duplicate ACKs for that segment
bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
Host A
timeo
ut
Host B
time
X
resend seq X2
seq x1seq x2seq x3seq x4seq x5
ACK x1
ACK x1ACK x1ACK x1
tripleduplicate
ACKs
Fast
Retra
nsm
it
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Flow Control
bull Receive side of TCP connection has a receive buffer
bull speed-matching service matching send rate to receiving applicationrsquos drain rate
bull App process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast
flow control
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
TCP Flow Control How it Works
(suppose TCP receiver discards out-of-order segments)
bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Receiver advertises unused buffer space by including rwnd value in segment header
bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos
buffer doesnrsquot overflow
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
rwndRcvBuffer
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control
info (eg RcvWindow)bull client connection
initiator Socket clientSocket = new
Socket(ldquohostnamerdquo port) bull server contacted by
client Socket connectionSocket =
welcomeSocketaccept()
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial
seq Step 3 client receives
SYNACK replies with ACK segment which may contain data
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer 3-81
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Principles of Congestion Control
Congestionbull Informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull Different from flow controlbull Manifestations
ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)
bull A ldquotop-10rdquo problem
Causescosts of Congestion Scenario 1
bull Two senders two receivers
bull One router infinite buffers
bull No retransmission
bull Large delays when congested
bull Maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
Causescosts of Congestion Scenario 2
bull One router finite buffers bull Sender retransmission of lost packet
finite shared output link
buffers
Host A
lin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of congestion Scenario 2
bull Always (goodput)
bull ldquoPerfectrdquo retransmission only when loss
bull Retransmission of delayed (not lost) packet makes
larger (than perfect case) for same
lin
lout
=
lin
lout
gtl
inl
out
ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of
pkt
R2
R2lin
l ou
t
b
R2
R2lin
l ou
t
a
R2
R2lin
l ou
t
c
R4
R3
Causescosts of Congestion Scenario 3
bull Four sendersbull Multihop pathsbull Timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of Congestion Scenario 3
Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Host A
Host B
lo
u
t
Approaches towards congestion control
End-end congestion control
bull No explicit feedback from network
bull Congestion inferred from end-system observed loss delay
bull Approach taken by TCP
Network-assisted congestion control
bull Routers provide feedback to end systemsndash Single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash Explicit rate sender should send at
Broadly
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Congestion Control
bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level
bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)
network not congested so increase sending rate
- lost segment assume loss due to congested network so decrease sending rate
TCP Congestion Control Bandwidth Probing
bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since
available bandwidth is changing depending on other connections in network)
ACKs being received so increase rate
X
X
XX
X loss so decrease rate
send
ing
rate
time
bull Q how fast to increasedecrease- details to follow
TCPrsquosldquosawtoothrdquobehavior
TCP Congestion Control detailsbull sender limits rate by limiting
number of unACKed bytes ldquoin pipelinerdquo
ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)
bull roughly
bull cwnd is dynamic function of perceived network congestion
rate = cwnd
RTT bytessec
LastByteSent-LastByteAcked cwnd
cwndbytes
RTT
ACK(s)
TCP Congestion Control more details
segment loss event reducing cwnd
bull timeout no response from receiverndash cut cwnd to 1
bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less
aggressively than on timeout
ACK received increase cwnd
bull slowstart phase - increase exponentially
fast (despite name) at connection start or following timeout
bull congestion avoidance - increase linearly
TCP Slow Start
bull when connection begins cwnd = 1 MSSndash example MSS = 500
bytes amp RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp
up to respectable ratebull increase rate exponentially
until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transitioning intoout of slowstart
ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2
ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to
congestion avoidance phase
slow start timeout
ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed
new ACKdupACKcount++
duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion
avoidance
TCP Congestion Avoidance
bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1
MSS per RTT ndash approach possible
congestion slower than in slowstart
ndash implementation cwnd = cwnd + MSScwnd for each ACK received
bull ACKs increase cwnd by 1 MSS per RTT additive increase
bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease
AIMD
AIMD Additive IncreaseMultiplicative Decrease
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
Transport Layer 3-52
Selective Repeat in Action
Selective Repeat
DilemmaExample bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
SR Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Overview RFCs 793 1122 1323 2018 2581
bull full duplex datandash bi-directional data flow
in same connectionndash MSS maximum
segment size
bull connection-oriented ndash handshaking
(exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not
overwhelm receiver
bull point-to-pointndash one sender one
receiver
bull reliable in-order byte steamndash no ldquomessage
boundariesrdquo
bull pipelinedndash TCP congestion and
flow control set window size
bull send amp receive buffers
socketdoor
T C Psend buffer
T C Preceive buffer
socketdoor
segm ent
applicationwrites data
applicationreads data
TCP Segment Structure
source port dest port
32 bits
applicationdata (variable length)
sequence number
acknowledgement number
Receive window
Urg data pointerchecksumFSRPAU
headlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
TCP Seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next
byte expected from other side
ndash cumulative ACKQ how receiver
handles out-of-order segmentsndash A TCP spec
doesnrsquot say up to implementer
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
UsertypeslsquoCrsquo
host ACKsreceipt of echoedlsquoCrsquo
timesimple telnet scenario
host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTT
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTTbull SampleRTT measured time
from segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
bull Exponential weighted moving averagebull influence of past sample decreases exponentially
fastbull typical value = 18th (or 0125)
Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety margin
bull First estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP reliable data transfer
bull TCP creates rdt service on top of IPrsquos unreliable service
bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs
bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control
TCP Sender EventsData rcvd from appbull Create segment with
seq bull seq is byte-stream
number of first data byte in segment
bull Start timer if not already running (think of timer as for oldest unACKed segment)
bull Expiration interval TimeOutInterval
Timeoutbull retransmit segment
that caused timeoutbull restart timer ACK rcvdbull If acknowledges
previously unACKed segmentsndash update what is known
to be ACKedndash start timer if there are
outstanding segments
TCP Sender
(simplified)
NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71
y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed
TCP Retransmission Scenarios
Host A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq
=92
tim
eout
ACK=120
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq
=92
tim
eout
SendBase= 100
SendBase= 120
SendBase= 120
Sendbase= 100
TCP Retransmission Scenarios (more)
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
Cumulative ACK scenario
Host B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
Fast Retransmit
bull Time-out period often relatively longndash Long delay before
resending lost packet
bull Detect lost segments via duplicate ACKsndash Sender often sends
many segments back-to-back
ndash If segment lost there will likely be many duplicate ACKs for that segment
bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
Host A
timeo
ut
Host B
time
X
resend seq X2
seq x1seq x2seq x3seq x4seq x5
ACK x1
ACK x1ACK x1ACK x1
tripleduplicate
ACKs
Fast
Retra
nsm
it
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Flow Control
bull Receive side of TCP connection has a receive buffer
bull speed-matching service matching send rate to receiving applicationrsquos drain rate
bull App process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast
flow control
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
TCP Flow Control How it Works
(suppose TCP receiver discards out-of-order segments)
bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Receiver advertises unused buffer space by including rwnd value in segment header
bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos
buffer doesnrsquot overflow
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
rwndRcvBuffer
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control
info (eg RcvWindow)bull client connection
initiator Socket clientSocket = new
Socket(ldquohostnamerdquo port) bull server contacted by
client Socket connectionSocket =
welcomeSocketaccept()
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial
seq Step 3 client receives
SYNACK replies with ACK segment which may contain data
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer 3-81
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Principles of Congestion Control
Congestionbull Informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull Different from flow controlbull Manifestations
ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)
bull A ldquotop-10rdquo problem
Causescosts of Congestion Scenario 1
bull Two senders two receivers
bull One router infinite buffers
bull No retransmission
bull Large delays when congested
bull Maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
Causescosts of Congestion Scenario 2
bull One router finite buffers bull Sender retransmission of lost packet
finite shared output link
buffers
Host A
lin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of congestion Scenario 2
bull Always (goodput)
bull ldquoPerfectrdquo retransmission only when loss
bull Retransmission of delayed (not lost) packet makes
larger (than perfect case) for same
lin
lout
=
lin
lout
gtl
inl
out
ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of
pkt
R2
R2lin
l ou
t
b
R2
R2lin
l ou
t
a
R2
R2lin
l ou
t
c
R4
R3
Causescosts of Congestion Scenario 3
bull Four sendersbull Multihop pathsbull Timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of Congestion Scenario 3
Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Host A
Host B
lo
u
t
Approaches towards congestion control
End-end congestion control
bull No explicit feedback from network
bull Congestion inferred from end-system observed loss delay
bull Approach taken by TCP
Network-assisted congestion control
bull Routers provide feedback to end systemsndash Single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash Explicit rate sender should send at
Broadly
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Congestion Control
bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level
bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)
network not congested so increase sending rate
- lost segment assume loss due to congested network so decrease sending rate
TCP Congestion Control Bandwidth Probing
bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since
available bandwidth is changing depending on other connections in network)
ACKs being received so increase rate
X
X
XX
X loss so decrease rate
send
ing
rate
time
bull Q how fast to increasedecrease- details to follow
TCPrsquosldquosawtoothrdquobehavior
TCP Congestion Control detailsbull sender limits rate by limiting
number of unACKed bytes ldquoin pipelinerdquo
ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)
bull roughly
bull cwnd is dynamic function of perceived network congestion
rate = cwnd
RTT bytessec
LastByteSent-LastByteAcked cwnd
cwndbytes
RTT
ACK(s)
TCP Congestion Control more details
segment loss event reducing cwnd
bull timeout no response from receiverndash cut cwnd to 1
bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less
aggressively than on timeout
ACK received increase cwnd
bull slowstart phase - increase exponentially
fast (despite name) at connection start or following timeout
bull congestion avoidance - increase linearly
TCP Slow Start
bull when connection begins cwnd = 1 MSSndash example MSS = 500
bytes amp RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp
up to respectable ratebull increase rate exponentially
until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transitioning intoout of slowstart
ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2
ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to
congestion avoidance phase
slow start timeout
ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed
new ACKdupACKcount++
duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion
avoidance
TCP Congestion Avoidance
bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1
MSS per RTT ndash approach possible
congestion slower than in slowstart
ndash implementation cwnd = cwnd + MSScwnd for each ACK received
bull ACKs increase cwnd by 1 MSS per RTT additive increase
bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease
AIMD
AIMD Additive IncreaseMultiplicative Decrease
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
Selective Repeat
DilemmaExample bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
SR Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Overview RFCs 793 1122 1323 2018 2581
bull full duplex datandash bi-directional data flow
in same connectionndash MSS maximum
segment size
bull connection-oriented ndash handshaking
(exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not
overwhelm receiver
bull point-to-pointndash one sender one
receiver
bull reliable in-order byte steamndash no ldquomessage
boundariesrdquo
bull pipelinedndash TCP congestion and
flow control set window size
bull send amp receive buffers
socketdoor
T C Psend buffer
T C Preceive buffer
socketdoor
segm ent
applicationwrites data
applicationreads data
TCP Segment Structure
source port dest port
32 bits
applicationdata (variable length)
sequence number
acknowledgement number
Receive window
Urg data pointerchecksumFSRPAU
headlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
TCP Seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next
byte expected from other side
ndash cumulative ACKQ how receiver
handles out-of-order segmentsndash A TCP spec
doesnrsquot say up to implementer
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
UsertypeslsquoCrsquo
host ACKsreceipt of echoedlsquoCrsquo
timesimple telnet scenario
host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTT
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTTbull SampleRTT measured time
from segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
bull Exponential weighted moving averagebull influence of past sample decreases exponentially
fastbull typical value = 18th (or 0125)
Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety margin
bull First estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP reliable data transfer
bull TCP creates rdt service on top of IPrsquos unreliable service
bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs
bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control
TCP Sender EventsData rcvd from appbull Create segment with
seq bull seq is byte-stream
number of first data byte in segment
bull Start timer if not already running (think of timer as for oldest unACKed segment)
bull Expiration interval TimeOutInterval
Timeoutbull retransmit segment
that caused timeoutbull restart timer ACK rcvdbull If acknowledges
previously unACKed segmentsndash update what is known
to be ACKedndash start timer if there are
outstanding segments
TCP Sender
(simplified)
NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71
y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed
TCP Retransmission Scenarios
Host A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq
=92
tim
eout
ACK=120
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq
=92
tim
eout
SendBase= 100
SendBase= 120
SendBase= 120
Sendbase= 100
TCP Retransmission Scenarios (more)
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
Cumulative ACK scenario
Host B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
Fast Retransmit
bull Time-out period often relatively longndash Long delay before
resending lost packet
bull Detect lost segments via duplicate ACKsndash Sender often sends
many segments back-to-back
ndash If segment lost there will likely be many duplicate ACKs for that segment
bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
Host A
timeo
ut
Host B
time
X
resend seq X2
seq x1seq x2seq x3seq x4seq x5
ACK x1
ACK x1ACK x1ACK x1
tripleduplicate
ACKs
Fast
Retra
nsm
it
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Flow Control
bull Receive side of TCP connection has a receive buffer
bull speed-matching service matching send rate to receiving applicationrsquos drain rate
bull App process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast
flow control
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
TCP Flow Control How it Works
(suppose TCP receiver discards out-of-order segments)
bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Receiver advertises unused buffer space by including rwnd value in segment header
bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos
buffer doesnrsquot overflow
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
rwndRcvBuffer
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control
info (eg RcvWindow)bull client connection
initiator Socket clientSocket = new
Socket(ldquohostnamerdquo port) bull server contacted by
client Socket connectionSocket =
welcomeSocketaccept()
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial
seq Step 3 client receives
SYNACK replies with ACK segment which may contain data
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer 3-81
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Principles of Congestion Control
Congestionbull Informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull Different from flow controlbull Manifestations
ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)
bull A ldquotop-10rdquo problem
Causescosts of Congestion Scenario 1
bull Two senders two receivers
bull One router infinite buffers
bull No retransmission
bull Large delays when congested
bull Maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
Causescosts of Congestion Scenario 2
bull One router finite buffers bull Sender retransmission of lost packet
finite shared output link
buffers
Host A
lin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of congestion Scenario 2
bull Always (goodput)
bull ldquoPerfectrdquo retransmission only when loss
bull Retransmission of delayed (not lost) packet makes
larger (than perfect case) for same
lin
lout
=
lin
lout
gtl
inl
out
ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of
pkt
R2
R2lin
l ou
t
b
R2
R2lin
l ou
t
a
R2
R2lin
l ou
t
c
R4
R3
Causescosts of Congestion Scenario 3
bull Four sendersbull Multihop pathsbull Timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of Congestion Scenario 3
Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Host A
Host B
lo
u
t
Approaches towards congestion control
End-end congestion control
bull No explicit feedback from network
bull Congestion inferred from end-system observed loss delay
bull Approach taken by TCP
Network-assisted congestion control
bull Routers provide feedback to end systemsndash Single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash Explicit rate sender should send at
Broadly
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Congestion Control
bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level
bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)
network not congested so increase sending rate
- lost segment assume loss due to congested network so decrease sending rate
TCP Congestion Control Bandwidth Probing
bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since
available bandwidth is changing depending on other connections in network)
ACKs being received so increase rate
X
X
XX
X loss so decrease rate
send
ing
rate
time
bull Q how fast to increasedecrease- details to follow
TCPrsquosldquosawtoothrdquobehavior
TCP Congestion Control detailsbull sender limits rate by limiting
number of unACKed bytes ldquoin pipelinerdquo
ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)
bull roughly
bull cwnd is dynamic function of perceived network congestion
rate = cwnd
RTT bytessec
LastByteSent-LastByteAcked cwnd
cwndbytes
RTT
ACK(s)
TCP Congestion Control more details
segment loss event reducing cwnd
bull timeout no response from receiverndash cut cwnd to 1
bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less
aggressively than on timeout
ACK received increase cwnd
bull slowstart phase - increase exponentially
fast (despite name) at connection start or following timeout
bull congestion avoidance - increase linearly
TCP Slow Start
bull when connection begins cwnd = 1 MSSndash example MSS = 500
bytes amp RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp
up to respectable ratebull increase rate exponentially
until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transitioning intoout of slowstart
ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2
ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to
congestion avoidance phase
slow start timeout
ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed
new ACKdupACKcount++
duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion
avoidance
TCP Congestion Avoidance
bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1
MSS per RTT ndash approach possible
congestion slower than in slowstart
ndash implementation cwnd = cwnd + MSScwnd for each ACK received
bull ACKs increase cwnd by 1 MSS per RTT additive increase
bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease
AIMD
AIMD Additive IncreaseMultiplicative Decrease
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
SR Applet
httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Overview RFCs 793 1122 1323 2018 2581
bull full duplex datandash bi-directional data flow
in same connectionndash MSS maximum
segment size
bull connection-oriented ndash handshaking
(exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not
overwhelm receiver
bull point-to-pointndash one sender one
receiver
bull reliable in-order byte steamndash no ldquomessage
boundariesrdquo
bull pipelinedndash TCP congestion and
flow control set window size
bull send amp receive buffers
socketdoor
T C Psend buffer
T C Preceive buffer
socketdoor
segm ent
applicationwrites data
applicationreads data
TCP Segment Structure
source port dest port
32 bits
applicationdata (variable length)
sequence number
acknowledgement number
Receive window
Urg data pointerchecksumFSRPAU
headlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
TCP Seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next
byte expected from other side
ndash cumulative ACKQ how receiver
handles out-of-order segmentsndash A TCP spec
doesnrsquot say up to implementer
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
UsertypeslsquoCrsquo
host ACKsreceipt of echoedlsquoCrsquo
timesimple telnet scenario
host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTT
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTTbull SampleRTT measured time
from segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
bull Exponential weighted moving averagebull influence of past sample decreases exponentially
fastbull typical value = 18th (or 0125)
Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety margin
bull First estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP reliable data transfer
bull TCP creates rdt service on top of IPrsquos unreliable service
bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs
bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control
TCP Sender EventsData rcvd from appbull Create segment with
seq bull seq is byte-stream
number of first data byte in segment
bull Start timer if not already running (think of timer as for oldest unACKed segment)
bull Expiration interval TimeOutInterval
Timeoutbull retransmit segment
that caused timeoutbull restart timer ACK rcvdbull If acknowledges
previously unACKed segmentsndash update what is known
to be ACKedndash start timer if there are
outstanding segments
TCP Sender
(simplified)
NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71
y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed
TCP Retransmission Scenarios
Host A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq
=92
tim
eout
ACK=120
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq
=92
tim
eout
SendBase= 100
SendBase= 120
SendBase= 120
Sendbase= 100
TCP Retransmission Scenarios (more)
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
Cumulative ACK scenario
Host B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
Fast Retransmit
bull Time-out period often relatively longndash Long delay before
resending lost packet
bull Detect lost segments via duplicate ACKsndash Sender often sends
many segments back-to-back
ndash If segment lost there will likely be many duplicate ACKs for that segment
bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
Host A
timeo
ut
Host B
time
X
resend seq X2
seq x1seq x2seq x3seq x4seq x5
ACK x1
ACK x1ACK x1ACK x1
tripleduplicate
ACKs
Fast
Retra
nsm
it
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Flow Control
bull Receive side of TCP connection has a receive buffer
bull speed-matching service matching send rate to receiving applicationrsquos drain rate
bull App process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast
flow control
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
TCP Flow Control How it Works
(suppose TCP receiver discards out-of-order segments)
bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Receiver advertises unused buffer space by including rwnd value in segment header
bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos
buffer doesnrsquot overflow
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
rwndRcvBuffer
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control
info (eg RcvWindow)bull client connection
initiator Socket clientSocket = new
Socket(ldquohostnamerdquo port) bull server contacted by
client Socket connectionSocket =
welcomeSocketaccept()
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial
seq Step 3 client receives
SYNACK replies with ACK segment which may contain data
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer 3-81
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Principles of Congestion Control
Congestionbull Informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull Different from flow controlbull Manifestations
ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)
bull A ldquotop-10rdquo problem
Causescosts of Congestion Scenario 1
bull Two senders two receivers
bull One router infinite buffers
bull No retransmission
bull Large delays when congested
bull Maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
Causescosts of Congestion Scenario 2
bull One router finite buffers bull Sender retransmission of lost packet
finite shared output link
buffers
Host A
lin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of congestion Scenario 2
bull Always (goodput)
bull ldquoPerfectrdquo retransmission only when loss
bull Retransmission of delayed (not lost) packet makes
larger (than perfect case) for same
lin
lout
=
lin
lout
gtl
inl
out
ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of
pkt
R2
R2lin
l ou
t
b
R2
R2lin
l ou
t
a
R2
R2lin
l ou
t
c
R4
R3
Causescosts of Congestion Scenario 3
bull Four sendersbull Multihop pathsbull Timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of Congestion Scenario 3
Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Host A
Host B
lo
u
t
Approaches towards congestion control
End-end congestion control
bull No explicit feedback from network
bull Congestion inferred from end-system observed loss delay
bull Approach taken by TCP
Network-assisted congestion control
bull Routers provide feedback to end systemsndash Single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash Explicit rate sender should send at
Broadly
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Congestion Control
bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level
bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)
network not congested so increase sending rate
- lost segment assume loss due to congested network so decrease sending rate
TCP Congestion Control Bandwidth Probing
bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since
available bandwidth is changing depending on other connections in network)
ACKs being received so increase rate
X
X
XX
X loss so decrease rate
send
ing
rate
time
bull Q how fast to increasedecrease- details to follow
TCPrsquosldquosawtoothrdquobehavior
TCP Congestion Control detailsbull sender limits rate by limiting
number of unACKed bytes ldquoin pipelinerdquo
ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)
bull roughly
bull cwnd is dynamic function of perceived network congestion
rate = cwnd
RTT bytessec
LastByteSent-LastByteAcked cwnd
cwndbytes
RTT
ACK(s)
TCP Congestion Control more details
segment loss event reducing cwnd
bull timeout no response from receiverndash cut cwnd to 1
bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less
aggressively than on timeout
ACK received increase cwnd
bull slowstart phase - increase exponentially
fast (despite name) at connection start or following timeout
bull congestion avoidance - increase linearly
TCP Slow Start
bull when connection begins cwnd = 1 MSSndash example MSS = 500
bytes amp RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp
up to respectable ratebull increase rate exponentially
until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transitioning intoout of slowstart
ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2
ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to
congestion avoidance phase
slow start timeout
ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed
new ACKdupACKcount++
duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion
avoidance
TCP Congestion Avoidance
bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1
MSS per RTT ndash approach possible
congestion slower than in slowstart
ndash implementation cwnd = cwnd + MSScwnd for each ACK received
bull ACKs increase cwnd by 1 MSS per RTT additive increase
bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease
AIMD
AIMD Additive IncreaseMultiplicative Decrease
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Overview RFCs 793 1122 1323 2018 2581
bull full duplex datandash bi-directional data flow
in same connectionndash MSS maximum
segment size
bull connection-oriented ndash handshaking
(exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not
overwhelm receiver
bull point-to-pointndash one sender one
receiver
bull reliable in-order byte steamndash no ldquomessage
boundariesrdquo
bull pipelinedndash TCP congestion and
flow control set window size
bull send amp receive buffers
socketdoor
T C Psend buffer
T C Preceive buffer
socketdoor
segm ent
applicationwrites data
applicationreads data
TCP Segment Structure
source port dest port
32 bits
applicationdata (variable length)
sequence number
acknowledgement number
Receive window
Urg data pointerchecksumFSRPAU
headlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
TCP Seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next
byte expected from other side
ndash cumulative ACKQ how receiver
handles out-of-order segmentsndash A TCP spec
doesnrsquot say up to implementer
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
UsertypeslsquoCrsquo
host ACKsreceipt of echoedlsquoCrsquo
timesimple telnet scenario
host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTT
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTTbull SampleRTT measured time
from segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
bull Exponential weighted moving averagebull influence of past sample decreases exponentially
fastbull typical value = 18th (or 0125)
Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety margin
bull First estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP reliable data transfer
bull TCP creates rdt service on top of IPrsquos unreliable service
bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs
bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control
TCP Sender EventsData rcvd from appbull Create segment with
seq bull seq is byte-stream
number of first data byte in segment
bull Start timer if not already running (think of timer as for oldest unACKed segment)
bull Expiration interval TimeOutInterval
Timeoutbull retransmit segment
that caused timeoutbull restart timer ACK rcvdbull If acknowledges
previously unACKed segmentsndash update what is known
to be ACKedndash start timer if there are
outstanding segments
TCP Sender
(simplified)
NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71
y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed
TCP Retransmission Scenarios
Host A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq
=92
tim
eout
ACK=120
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq
=92
tim
eout
SendBase= 100
SendBase= 120
SendBase= 120
Sendbase= 100
TCP Retransmission Scenarios (more)
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
Cumulative ACK scenario
Host B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
Fast Retransmit
bull Time-out period often relatively longndash Long delay before
resending lost packet
bull Detect lost segments via duplicate ACKsndash Sender often sends
many segments back-to-back
ndash If segment lost there will likely be many duplicate ACKs for that segment
bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
Host A
timeo
ut
Host B
time
X
resend seq X2
seq x1seq x2seq x3seq x4seq x5
ACK x1
ACK x1ACK x1ACK x1
tripleduplicate
ACKs
Fast
Retra
nsm
it
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Flow Control
bull Receive side of TCP connection has a receive buffer
bull speed-matching service matching send rate to receiving applicationrsquos drain rate
bull App process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast
flow control
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
TCP Flow Control How it Works
(suppose TCP receiver discards out-of-order segments)
bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Receiver advertises unused buffer space by including rwnd value in segment header
bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos
buffer doesnrsquot overflow
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
rwndRcvBuffer
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control
info (eg RcvWindow)bull client connection
initiator Socket clientSocket = new
Socket(ldquohostnamerdquo port) bull server contacted by
client Socket connectionSocket =
welcomeSocketaccept()
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial
seq Step 3 client receives
SYNACK replies with ACK segment which may contain data
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer 3-81
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Principles of Congestion Control
Congestionbull Informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull Different from flow controlbull Manifestations
ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)
bull A ldquotop-10rdquo problem
Causescosts of Congestion Scenario 1
bull Two senders two receivers
bull One router infinite buffers
bull No retransmission
bull Large delays when congested
bull Maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
Causescosts of Congestion Scenario 2
bull One router finite buffers bull Sender retransmission of lost packet
finite shared output link
buffers
Host A
lin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of congestion Scenario 2
bull Always (goodput)
bull ldquoPerfectrdquo retransmission only when loss
bull Retransmission of delayed (not lost) packet makes
larger (than perfect case) for same
lin
lout
=
lin
lout
gtl
inl
out
ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of
pkt
R2
R2lin
l ou
t
b
R2
R2lin
l ou
t
a
R2
R2lin
l ou
t
c
R4
R3
Causescosts of Congestion Scenario 3
bull Four sendersbull Multihop pathsbull Timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of Congestion Scenario 3
Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Host A
Host B
lo
u
t
Approaches towards congestion control
End-end congestion control
bull No explicit feedback from network
bull Congestion inferred from end-system observed loss delay
bull Approach taken by TCP
Network-assisted congestion control
bull Routers provide feedback to end systemsndash Single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash Explicit rate sender should send at
Broadly
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Congestion Control
bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level
bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)
network not congested so increase sending rate
- lost segment assume loss due to congested network so decrease sending rate
TCP Congestion Control Bandwidth Probing
bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since
available bandwidth is changing depending on other connections in network)
ACKs being received so increase rate
X
X
XX
X loss so decrease rate
send
ing
rate
time
bull Q how fast to increasedecrease- details to follow
TCPrsquosldquosawtoothrdquobehavior
TCP Congestion Control detailsbull sender limits rate by limiting
number of unACKed bytes ldquoin pipelinerdquo
ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)
bull roughly
bull cwnd is dynamic function of perceived network congestion
rate = cwnd
RTT bytessec
LastByteSent-LastByteAcked cwnd
cwndbytes
RTT
ACK(s)
TCP Congestion Control more details
segment loss event reducing cwnd
bull timeout no response from receiverndash cut cwnd to 1
bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less
aggressively than on timeout
ACK received increase cwnd
bull slowstart phase - increase exponentially
fast (despite name) at connection start or following timeout
bull congestion avoidance - increase linearly
TCP Slow Start
bull when connection begins cwnd = 1 MSSndash example MSS = 500
bytes amp RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp
up to respectable ratebull increase rate exponentially
until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transitioning intoout of slowstart
ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2
ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to
congestion avoidance phase
slow start timeout
ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed
new ACKdupACKcount++
duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion
avoidance
TCP Congestion Avoidance
bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1
MSS per RTT ndash approach possible
congestion slower than in slowstart
ndash implementation cwnd = cwnd + MSScwnd for each ACK received
bull ACKs increase cwnd by 1 MSS per RTT additive increase
bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease
AIMD
AIMD Additive IncreaseMultiplicative Decrease
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
TCP Overview RFCs 793 1122 1323 2018 2581
bull full duplex datandash bi-directional data flow
in same connectionndash MSS maximum
segment size
bull connection-oriented ndash handshaking
(exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not
overwhelm receiver
bull point-to-pointndash one sender one
receiver
bull reliable in-order byte steamndash no ldquomessage
boundariesrdquo
bull pipelinedndash TCP congestion and
flow control set window size
bull send amp receive buffers
socketdoor
T C Psend buffer
T C Preceive buffer
socketdoor
segm ent
applicationwrites data
applicationreads data
TCP Segment Structure
source port dest port
32 bits
applicationdata (variable length)
sequence number
acknowledgement number
Receive window
Urg data pointerchecksumFSRPAU
headlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
TCP Seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next
byte expected from other side
ndash cumulative ACKQ how receiver
handles out-of-order segmentsndash A TCP spec
doesnrsquot say up to implementer
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
UsertypeslsquoCrsquo
host ACKsreceipt of echoedlsquoCrsquo
timesimple telnet scenario
host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTT
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTTbull SampleRTT measured time
from segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
bull Exponential weighted moving averagebull influence of past sample decreases exponentially
fastbull typical value = 18th (or 0125)
Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety margin
bull First estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP reliable data transfer
bull TCP creates rdt service on top of IPrsquos unreliable service
bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs
bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control
TCP Sender EventsData rcvd from appbull Create segment with
seq bull seq is byte-stream
number of first data byte in segment
bull Start timer if not already running (think of timer as for oldest unACKed segment)
bull Expiration interval TimeOutInterval
Timeoutbull retransmit segment
that caused timeoutbull restart timer ACK rcvdbull If acknowledges
previously unACKed segmentsndash update what is known
to be ACKedndash start timer if there are
outstanding segments
TCP Sender
(simplified)
NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71
y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed
TCP Retransmission Scenarios
Host A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq
=92
tim
eout
ACK=120
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq
=92
tim
eout
SendBase= 100
SendBase= 120
SendBase= 120
Sendbase= 100
TCP Retransmission Scenarios (more)
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
Cumulative ACK scenario
Host B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
Fast Retransmit
bull Time-out period often relatively longndash Long delay before
resending lost packet
bull Detect lost segments via duplicate ACKsndash Sender often sends
many segments back-to-back
ndash If segment lost there will likely be many duplicate ACKs for that segment
bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
Host A
timeo
ut
Host B
time
X
resend seq X2
seq x1seq x2seq x3seq x4seq x5
ACK x1
ACK x1ACK x1ACK x1
tripleduplicate
ACKs
Fast
Retra
nsm
it
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Flow Control
bull Receive side of TCP connection has a receive buffer
bull speed-matching service matching send rate to receiving applicationrsquos drain rate
bull App process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast
flow control
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
TCP Flow Control How it Works
(suppose TCP receiver discards out-of-order segments)
bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Receiver advertises unused buffer space by including rwnd value in segment header
bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos
buffer doesnrsquot overflow
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
rwndRcvBuffer
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control
info (eg RcvWindow)bull client connection
initiator Socket clientSocket = new
Socket(ldquohostnamerdquo port) bull server contacted by
client Socket connectionSocket =
welcomeSocketaccept()
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial
seq Step 3 client receives
SYNACK replies with ACK segment which may contain data
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer 3-81
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Principles of Congestion Control
Congestionbull Informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull Different from flow controlbull Manifestations
ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)
bull A ldquotop-10rdquo problem
Causescosts of Congestion Scenario 1
bull Two senders two receivers
bull One router infinite buffers
bull No retransmission
bull Large delays when congested
bull Maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
Causescosts of Congestion Scenario 2
bull One router finite buffers bull Sender retransmission of lost packet
finite shared output link
buffers
Host A
lin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of congestion Scenario 2
bull Always (goodput)
bull ldquoPerfectrdquo retransmission only when loss
bull Retransmission of delayed (not lost) packet makes
larger (than perfect case) for same
lin
lout
=
lin
lout
gtl
inl
out
ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of
pkt
R2
R2lin
l ou
t
b
R2
R2lin
l ou
t
a
R2
R2lin
l ou
t
c
R4
R3
Causescosts of Congestion Scenario 3
bull Four sendersbull Multihop pathsbull Timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of Congestion Scenario 3
Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Host A
Host B
lo
u
t
Approaches towards congestion control
End-end congestion control
bull No explicit feedback from network
bull Congestion inferred from end-system observed loss delay
bull Approach taken by TCP
Network-assisted congestion control
bull Routers provide feedback to end systemsndash Single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash Explicit rate sender should send at
Broadly
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Congestion Control
bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level
bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)
network not congested so increase sending rate
- lost segment assume loss due to congested network so decrease sending rate
TCP Congestion Control Bandwidth Probing
bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since
available bandwidth is changing depending on other connections in network)
ACKs being received so increase rate
X
X
XX
X loss so decrease rate
send
ing
rate
time
bull Q how fast to increasedecrease- details to follow
TCPrsquosldquosawtoothrdquobehavior
TCP Congestion Control detailsbull sender limits rate by limiting
number of unACKed bytes ldquoin pipelinerdquo
ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)
bull roughly
bull cwnd is dynamic function of perceived network congestion
rate = cwnd
RTT bytessec
LastByteSent-LastByteAcked cwnd
cwndbytes
RTT
ACK(s)
TCP Congestion Control more details
segment loss event reducing cwnd
bull timeout no response from receiverndash cut cwnd to 1
bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less
aggressively than on timeout
ACK received increase cwnd
bull slowstart phase - increase exponentially
fast (despite name) at connection start or following timeout
bull congestion avoidance - increase linearly
TCP Slow Start
bull when connection begins cwnd = 1 MSSndash example MSS = 500
bytes amp RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp
up to respectable ratebull increase rate exponentially
until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transitioning intoout of slowstart
ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2
ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to
congestion avoidance phase
slow start timeout
ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed
new ACKdupACKcount++
duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion
avoidance
TCP Congestion Avoidance
bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1
MSS per RTT ndash approach possible
congestion slower than in slowstart
ndash implementation cwnd = cwnd + MSScwnd for each ACK received
bull ACKs increase cwnd by 1 MSS per RTT additive increase
bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease
AIMD
AIMD Additive IncreaseMultiplicative Decrease
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
TCP Segment Structure
source port dest port
32 bits
applicationdata (variable length)
sequence number
acknowledgement number
Receive window
Urg data pointerchecksumFSRPAU
headlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
TCP Seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next
byte expected from other side
ndash cumulative ACKQ how receiver
handles out-of-order segmentsndash A TCP spec
doesnrsquot say up to implementer
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
UsertypeslsquoCrsquo
host ACKsreceipt of echoedlsquoCrsquo
timesimple telnet scenario
host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTT
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTTbull SampleRTT measured time
from segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
bull Exponential weighted moving averagebull influence of past sample decreases exponentially
fastbull typical value = 18th (or 0125)
Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety margin
bull First estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP reliable data transfer
bull TCP creates rdt service on top of IPrsquos unreliable service
bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs
bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control
TCP Sender EventsData rcvd from appbull Create segment with
seq bull seq is byte-stream
number of first data byte in segment
bull Start timer if not already running (think of timer as for oldest unACKed segment)
bull Expiration interval TimeOutInterval
Timeoutbull retransmit segment
that caused timeoutbull restart timer ACK rcvdbull If acknowledges
previously unACKed segmentsndash update what is known
to be ACKedndash start timer if there are
outstanding segments
TCP Sender
(simplified)
NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71
y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed
TCP Retransmission Scenarios
Host A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq
=92
tim
eout
ACK=120
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq
=92
tim
eout
SendBase= 100
SendBase= 120
SendBase= 120
Sendbase= 100
TCP Retransmission Scenarios (more)
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
Cumulative ACK scenario
Host B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
Fast Retransmit
bull Time-out period often relatively longndash Long delay before
resending lost packet
bull Detect lost segments via duplicate ACKsndash Sender often sends
many segments back-to-back
ndash If segment lost there will likely be many duplicate ACKs for that segment
bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
Host A
timeo
ut
Host B
time
X
resend seq X2
seq x1seq x2seq x3seq x4seq x5
ACK x1
ACK x1ACK x1ACK x1
tripleduplicate
ACKs
Fast
Retra
nsm
it
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Flow Control
bull Receive side of TCP connection has a receive buffer
bull speed-matching service matching send rate to receiving applicationrsquos drain rate
bull App process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast
flow control
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
TCP Flow Control How it Works
(suppose TCP receiver discards out-of-order segments)
bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Receiver advertises unused buffer space by including rwnd value in segment header
bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos
buffer doesnrsquot overflow
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
rwndRcvBuffer
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control
info (eg RcvWindow)bull client connection
initiator Socket clientSocket = new
Socket(ldquohostnamerdquo port) bull server contacted by
client Socket connectionSocket =
welcomeSocketaccept()
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial
seq Step 3 client receives
SYNACK replies with ACK segment which may contain data
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer 3-81
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Principles of Congestion Control
Congestionbull Informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull Different from flow controlbull Manifestations
ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)
bull A ldquotop-10rdquo problem
Causescosts of Congestion Scenario 1
bull Two senders two receivers
bull One router infinite buffers
bull No retransmission
bull Large delays when congested
bull Maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
Causescosts of Congestion Scenario 2
bull One router finite buffers bull Sender retransmission of lost packet
finite shared output link
buffers
Host A
lin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of congestion Scenario 2
bull Always (goodput)
bull ldquoPerfectrdquo retransmission only when loss
bull Retransmission of delayed (not lost) packet makes
larger (than perfect case) for same
lin
lout
=
lin
lout
gtl
inl
out
ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of
pkt
R2
R2lin
l ou
t
b
R2
R2lin
l ou
t
a
R2
R2lin
l ou
t
c
R4
R3
Causescosts of Congestion Scenario 3
bull Four sendersbull Multihop pathsbull Timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of Congestion Scenario 3
Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Host A
Host B
lo
u
t
Approaches towards congestion control
End-end congestion control
bull No explicit feedback from network
bull Congestion inferred from end-system observed loss delay
bull Approach taken by TCP
Network-assisted congestion control
bull Routers provide feedback to end systemsndash Single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash Explicit rate sender should send at
Broadly
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Congestion Control
bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level
bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)
network not congested so increase sending rate
- lost segment assume loss due to congested network so decrease sending rate
TCP Congestion Control Bandwidth Probing
bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since
available bandwidth is changing depending on other connections in network)
ACKs being received so increase rate
X
X
XX
X loss so decrease rate
send
ing
rate
time
bull Q how fast to increasedecrease- details to follow
TCPrsquosldquosawtoothrdquobehavior
TCP Congestion Control detailsbull sender limits rate by limiting
number of unACKed bytes ldquoin pipelinerdquo
ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)
bull roughly
bull cwnd is dynamic function of perceived network congestion
rate = cwnd
RTT bytessec
LastByteSent-LastByteAcked cwnd
cwndbytes
RTT
ACK(s)
TCP Congestion Control more details
segment loss event reducing cwnd
bull timeout no response from receiverndash cut cwnd to 1
bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less
aggressively than on timeout
ACK received increase cwnd
bull slowstart phase - increase exponentially
fast (despite name) at connection start or following timeout
bull congestion avoidance - increase linearly
TCP Slow Start
bull when connection begins cwnd = 1 MSSndash example MSS = 500
bytes amp RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp
up to respectable ratebull increase rate exponentially
until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transitioning intoout of slowstart
ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2
ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to
congestion avoidance phase
slow start timeout
ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed
new ACKdupACKcount++
duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion
avoidance
TCP Congestion Avoidance
bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1
MSS per RTT ndash approach possible
congestion slower than in slowstart
ndash implementation cwnd = cwnd + MSScwnd for each ACK received
bull ACKs increase cwnd by 1 MSS per RTT additive increase
bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease
AIMD
AIMD Additive IncreaseMultiplicative Decrease
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
TCP Seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next
byte expected from other side
ndash cumulative ACKQ how receiver
handles out-of-order segmentsndash A TCP spec
doesnrsquot say up to implementer
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
UsertypeslsquoCrsquo
host ACKsreceipt of echoedlsquoCrsquo
timesimple telnet scenario
host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTT
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTTbull SampleRTT measured time
from segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
bull Exponential weighted moving averagebull influence of past sample decreases exponentially
fastbull typical value = 18th (or 0125)
Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety margin
bull First estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP reliable data transfer
bull TCP creates rdt service on top of IPrsquos unreliable service
bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs
bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control
TCP Sender EventsData rcvd from appbull Create segment with
seq bull seq is byte-stream
number of first data byte in segment
bull Start timer if not already running (think of timer as for oldest unACKed segment)
bull Expiration interval TimeOutInterval
Timeoutbull retransmit segment
that caused timeoutbull restart timer ACK rcvdbull If acknowledges
previously unACKed segmentsndash update what is known
to be ACKedndash start timer if there are
outstanding segments
TCP Sender
(simplified)
NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71
y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed
TCP Retransmission Scenarios
Host A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq
=92
tim
eout
ACK=120
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq
=92
tim
eout
SendBase= 100
SendBase= 120
SendBase= 120
Sendbase= 100
TCP Retransmission Scenarios (more)
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
Cumulative ACK scenario
Host B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
Fast Retransmit
bull Time-out period often relatively longndash Long delay before
resending lost packet
bull Detect lost segments via duplicate ACKsndash Sender often sends
many segments back-to-back
ndash If segment lost there will likely be many duplicate ACKs for that segment
bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
Host A
timeo
ut
Host B
time
X
resend seq X2
seq x1seq x2seq x3seq x4seq x5
ACK x1
ACK x1ACK x1ACK x1
tripleduplicate
ACKs
Fast
Retra
nsm
it
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Flow Control
bull Receive side of TCP connection has a receive buffer
bull speed-matching service matching send rate to receiving applicationrsquos drain rate
bull App process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast
flow control
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
TCP Flow Control How it Works
(suppose TCP receiver discards out-of-order segments)
bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Receiver advertises unused buffer space by including rwnd value in segment header
bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos
buffer doesnrsquot overflow
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
rwndRcvBuffer
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control
info (eg RcvWindow)bull client connection
initiator Socket clientSocket = new
Socket(ldquohostnamerdquo port) bull server contacted by
client Socket connectionSocket =
welcomeSocketaccept()
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial
seq Step 3 client receives
SYNACK replies with ACK segment which may contain data
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer 3-81
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Principles of Congestion Control
Congestionbull Informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull Different from flow controlbull Manifestations
ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)
bull A ldquotop-10rdquo problem
Causescosts of Congestion Scenario 1
bull Two senders two receivers
bull One router infinite buffers
bull No retransmission
bull Large delays when congested
bull Maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
Causescosts of Congestion Scenario 2
bull One router finite buffers bull Sender retransmission of lost packet
finite shared output link
buffers
Host A
lin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of congestion Scenario 2
bull Always (goodput)
bull ldquoPerfectrdquo retransmission only when loss
bull Retransmission of delayed (not lost) packet makes
larger (than perfect case) for same
lin
lout
=
lin
lout
gtl
inl
out
ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of
pkt
R2
R2lin
l ou
t
b
R2
R2lin
l ou
t
a
R2
R2lin
l ou
t
c
R4
R3
Causescosts of Congestion Scenario 3
bull Four sendersbull Multihop pathsbull Timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of Congestion Scenario 3
Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Host A
Host B
lo
u
t
Approaches towards congestion control
End-end congestion control
bull No explicit feedback from network
bull Congestion inferred from end-system observed loss delay
bull Approach taken by TCP
Network-assisted congestion control
bull Routers provide feedback to end systemsndash Single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash Explicit rate sender should send at
Broadly
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Congestion Control
bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level
bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)
network not congested so increase sending rate
- lost segment assume loss due to congested network so decrease sending rate
TCP Congestion Control Bandwidth Probing
bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since
available bandwidth is changing depending on other connections in network)
ACKs being received so increase rate
X
X
XX
X loss so decrease rate
send
ing
rate
time
bull Q how fast to increasedecrease- details to follow
TCPrsquosldquosawtoothrdquobehavior
TCP Congestion Control detailsbull sender limits rate by limiting
number of unACKed bytes ldquoin pipelinerdquo
ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)
bull roughly
bull cwnd is dynamic function of perceived network congestion
rate = cwnd
RTT bytessec
LastByteSent-LastByteAcked cwnd
cwndbytes
RTT
ACK(s)
TCP Congestion Control more details
segment loss event reducing cwnd
bull timeout no response from receiverndash cut cwnd to 1
bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less
aggressively than on timeout
ACK received increase cwnd
bull slowstart phase - increase exponentially
fast (despite name) at connection start or following timeout
bull congestion avoidance - increase linearly
TCP Slow Start
bull when connection begins cwnd = 1 MSSndash example MSS = 500
bytes amp RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp
up to respectable ratebull increase rate exponentially
until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transitioning intoout of slowstart
ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2
ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to
congestion avoidance phase
slow start timeout
ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed
new ACKdupACKcount++
duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion
avoidance
TCP Congestion Avoidance
bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1
MSS per RTT ndash approach possible
congestion slower than in slowstart
ndash implementation cwnd = cwnd + MSScwnd for each ACK received
bull ACKs increase cwnd by 1 MSS per RTT additive increase
bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease
AIMD
AIMD Additive IncreaseMultiplicative Decrease
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTT
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTTbull SampleRTT measured time
from segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
bull Exponential weighted moving averagebull influence of past sample decreases exponentially
fastbull typical value = 18th (or 0125)
Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety margin
bull First estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP reliable data transfer
bull TCP creates rdt service on top of IPrsquos unreliable service
bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs
bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control
TCP Sender EventsData rcvd from appbull Create segment with
seq bull seq is byte-stream
number of first data byte in segment
bull Start timer if not already running (think of timer as for oldest unACKed segment)
bull Expiration interval TimeOutInterval
Timeoutbull retransmit segment
that caused timeoutbull restart timer ACK rcvdbull If acknowledges
previously unACKed segmentsndash update what is known
to be ACKedndash start timer if there are
outstanding segments
TCP Sender
(simplified)
NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71
y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed
TCP Retransmission Scenarios
Host A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq
=92
tim
eout
ACK=120
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq
=92
tim
eout
SendBase= 100
SendBase= 120
SendBase= 120
Sendbase= 100
TCP Retransmission Scenarios (more)
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
Cumulative ACK scenario
Host B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
Fast Retransmit
bull Time-out period often relatively longndash Long delay before
resending lost packet
bull Detect lost segments via duplicate ACKsndash Sender often sends
many segments back-to-back
ndash If segment lost there will likely be many duplicate ACKs for that segment
bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
Host A
timeo
ut
Host B
time
X
resend seq X2
seq x1seq x2seq x3seq x4seq x5
ACK x1
ACK x1ACK x1ACK x1
tripleduplicate
ACKs
Fast
Retra
nsm
it
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Flow Control
bull Receive side of TCP connection has a receive buffer
bull speed-matching service matching send rate to receiving applicationrsquos drain rate
bull App process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast
flow control
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
TCP Flow Control How it Works
(suppose TCP receiver discards out-of-order segments)
bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Receiver advertises unused buffer space by including rwnd value in segment header
bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos
buffer doesnrsquot overflow
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
rwndRcvBuffer
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control
info (eg RcvWindow)bull client connection
initiator Socket clientSocket = new
Socket(ldquohostnamerdquo port) bull server contacted by
client Socket connectionSocket =
welcomeSocketaccept()
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial
seq Step 3 client receives
SYNACK replies with ACK segment which may contain data
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer 3-81
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Principles of Congestion Control
Congestionbull Informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull Different from flow controlbull Manifestations
ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)
bull A ldquotop-10rdquo problem
Causescosts of Congestion Scenario 1
bull Two senders two receivers
bull One router infinite buffers
bull No retransmission
bull Large delays when congested
bull Maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
Causescosts of Congestion Scenario 2
bull One router finite buffers bull Sender retransmission of lost packet
finite shared output link
buffers
Host A
lin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of congestion Scenario 2
bull Always (goodput)
bull ldquoPerfectrdquo retransmission only when loss
bull Retransmission of delayed (not lost) packet makes
larger (than perfect case) for same
lin
lout
=
lin
lout
gtl
inl
out
ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of
pkt
R2
R2lin
l ou
t
b
R2
R2lin
l ou
t
a
R2
R2lin
l ou
t
c
R4
R3
Causescosts of Congestion Scenario 3
bull Four sendersbull Multihop pathsbull Timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of Congestion Scenario 3
Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Host A
Host B
lo
u
t
Approaches towards congestion control
End-end congestion control
bull No explicit feedback from network
bull Congestion inferred from end-system observed loss delay
bull Approach taken by TCP
Network-assisted congestion control
bull Routers provide feedback to end systemsndash Single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash Explicit rate sender should send at
Broadly
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Congestion Control
bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level
bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)
network not congested so increase sending rate
- lost segment assume loss due to congested network so decrease sending rate
TCP Congestion Control Bandwidth Probing
bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since
available bandwidth is changing depending on other connections in network)
ACKs being received so increase rate
X
X
XX
X loss so decrease rate
send
ing
rate
time
bull Q how fast to increasedecrease- details to follow
TCPrsquosldquosawtoothrdquobehavior
TCP Congestion Control detailsbull sender limits rate by limiting
number of unACKed bytes ldquoin pipelinerdquo
ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)
bull roughly
bull cwnd is dynamic function of perceived network congestion
rate = cwnd
RTT bytessec
LastByteSent-LastByteAcked cwnd
cwndbytes
RTT
ACK(s)
TCP Congestion Control more details
segment loss event reducing cwnd
bull timeout no response from receiverndash cut cwnd to 1
bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less
aggressively than on timeout
ACK received increase cwnd
bull slowstart phase - increase exponentially
fast (despite name) at connection start or following timeout
bull congestion avoidance - increase linearly
TCP Slow Start
bull when connection begins cwnd = 1 MSSndash example MSS = 500
bytes amp RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp
up to respectable ratebull increase rate exponentially
until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transitioning intoout of slowstart
ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2
ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to
congestion avoidance phase
slow start timeout
ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed
new ACKdupACKcount++
duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion
avoidance
TCP Congestion Avoidance
bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1
MSS per RTT ndash approach possible
congestion slower than in slowstart
ndash implementation cwnd = cwnd + MSScwnd for each ACK received
bull ACKs increase cwnd by 1 MSS per RTT additive increase
bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease
AIMD
AIMD Additive IncreaseMultiplicative Decrease
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTT
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTTbull SampleRTT measured time
from segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
bull Exponential weighted moving averagebull influence of past sample decreases exponentially
fastbull typical value = 18th (or 0125)
Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety margin
bull First estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP reliable data transfer
bull TCP creates rdt service on top of IPrsquos unreliable service
bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs
bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control
TCP Sender EventsData rcvd from appbull Create segment with
seq bull seq is byte-stream
number of first data byte in segment
bull Start timer if not already running (think of timer as for oldest unACKed segment)
bull Expiration interval TimeOutInterval
Timeoutbull retransmit segment
that caused timeoutbull restart timer ACK rcvdbull If acknowledges
previously unACKed segmentsndash update what is known
to be ACKedndash start timer if there are
outstanding segments
TCP Sender
(simplified)
NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71
y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed
TCP Retransmission Scenarios
Host A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq
=92
tim
eout
ACK=120
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq
=92
tim
eout
SendBase= 100
SendBase= 120
SendBase= 120
Sendbase= 100
TCP Retransmission Scenarios (more)
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
Cumulative ACK scenario
Host B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
Fast Retransmit
bull Time-out period often relatively longndash Long delay before
resending lost packet
bull Detect lost segments via duplicate ACKsndash Sender often sends
many segments back-to-back
ndash If segment lost there will likely be many duplicate ACKs for that segment
bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
Host A
timeo
ut
Host B
time
X
resend seq X2
seq x1seq x2seq x3seq x4seq x5
ACK x1
ACK x1ACK x1ACK x1
tripleduplicate
ACKs
Fast
Retra
nsm
it
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Flow Control
bull Receive side of TCP connection has a receive buffer
bull speed-matching service matching send rate to receiving applicationrsquos drain rate
bull App process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast
flow control
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
TCP Flow Control How it Works
(suppose TCP receiver discards out-of-order segments)
bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Receiver advertises unused buffer space by including rwnd value in segment header
bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos
buffer doesnrsquot overflow
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
rwndRcvBuffer
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control
info (eg RcvWindow)bull client connection
initiator Socket clientSocket = new
Socket(ldquohostnamerdquo port) bull server contacted by
client Socket connectionSocket =
welcomeSocketaccept()
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial
seq Step 3 client receives
SYNACK replies with ACK segment which may contain data
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer 3-81
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Principles of Congestion Control
Congestionbull Informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull Different from flow controlbull Manifestations
ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)
bull A ldquotop-10rdquo problem
Causescosts of Congestion Scenario 1
bull Two senders two receivers
bull One router infinite buffers
bull No retransmission
bull Large delays when congested
bull Maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
Causescosts of Congestion Scenario 2
bull One router finite buffers bull Sender retransmission of lost packet
finite shared output link
buffers
Host A
lin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of congestion Scenario 2
bull Always (goodput)
bull ldquoPerfectrdquo retransmission only when loss
bull Retransmission of delayed (not lost) packet makes
larger (than perfect case) for same
lin
lout
=
lin
lout
gtl
inl
out
ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of
pkt
R2
R2lin
l ou
t
b
R2
R2lin
l ou
t
a
R2
R2lin
l ou
t
c
R4
R3
Causescosts of Congestion Scenario 3
bull Four sendersbull Multihop pathsbull Timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of Congestion Scenario 3
Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Host A
Host B
lo
u
t
Approaches towards congestion control
End-end congestion control
bull No explicit feedback from network
bull Congestion inferred from end-system observed loss delay
bull Approach taken by TCP
Network-assisted congestion control
bull Routers provide feedback to end systemsndash Single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash Explicit rate sender should send at
Broadly
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Congestion Control
bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level
bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)
network not congested so increase sending rate
- lost segment assume loss due to congested network so decrease sending rate
TCP Congestion Control Bandwidth Probing
bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since
available bandwidth is changing depending on other connections in network)
ACKs being received so increase rate
X
X
XX
X loss so decrease rate
send
ing
rate
time
bull Q how fast to increasedecrease- details to follow
TCPrsquosldquosawtoothrdquobehavior
TCP Congestion Control detailsbull sender limits rate by limiting
number of unACKed bytes ldquoin pipelinerdquo
ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)
bull roughly
bull cwnd is dynamic function of perceived network congestion
rate = cwnd
RTT bytessec
LastByteSent-LastByteAcked cwnd
cwndbytes
RTT
ACK(s)
TCP Congestion Control more details
segment loss event reducing cwnd
bull timeout no response from receiverndash cut cwnd to 1
bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less
aggressively than on timeout
ACK received increase cwnd
bull slowstart phase - increase exponentially
fast (despite name) at connection start or following timeout
bull congestion avoidance - increase linearly
TCP Slow Start
bull when connection begins cwnd = 1 MSSndash example MSS = 500
bytes amp RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp
up to respectable ratebull increase rate exponentially
until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transitioning intoout of slowstart
ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2
ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to
congestion avoidance phase
slow start timeout
ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed
new ACKdupACKcount++
duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion
avoidance
TCP Congestion Avoidance
bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1
MSS per RTT ndash approach possible
congestion slower than in slowstart
ndash implementation cwnd = cwnd + MSScwnd for each ACK received
bull ACKs increase cwnd by 1 MSS per RTT additive increase
bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease
AIMD
AIMD Additive IncreaseMultiplicative Decrease
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull Longer than RTTndash but RTT varies
bull Too short premature timeoutndash unnecessary
retransmissionsbull Too long slow
reaction to segment loss
Q how to estimate RTTbull SampleRTT measured time
from segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
bull Exponential weighted moving averagebull influence of past sample decreases exponentially
fastbull typical value = 18th (or 0125)
Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety margin
bull First estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP reliable data transfer
bull TCP creates rdt service on top of IPrsquos unreliable service
bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs
bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control
TCP Sender EventsData rcvd from appbull Create segment with
seq bull seq is byte-stream
number of first data byte in segment
bull Start timer if not already running (think of timer as for oldest unACKed segment)
bull Expiration interval TimeOutInterval
Timeoutbull retransmit segment
that caused timeoutbull restart timer ACK rcvdbull If acknowledges
previously unACKed segmentsndash update what is known
to be ACKedndash start timer if there are
outstanding segments
TCP Sender
(simplified)
NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71
y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed
TCP Retransmission Scenarios
Host A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq
=92
tim
eout
ACK=120
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq
=92
tim
eout
SendBase= 100
SendBase= 120
SendBase= 120
Sendbase= 100
TCP Retransmission Scenarios (more)
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
Cumulative ACK scenario
Host B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
Fast Retransmit
bull Time-out period often relatively longndash Long delay before
resending lost packet
bull Detect lost segments via duplicate ACKsndash Sender often sends
many segments back-to-back
ndash If segment lost there will likely be many duplicate ACKs for that segment
bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
Host A
timeo
ut
Host B
time
X
resend seq X2
seq x1seq x2seq x3seq x4seq x5
ACK x1
ACK x1ACK x1ACK x1
tripleduplicate
ACKs
Fast
Retra
nsm
it
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Flow Control
bull Receive side of TCP connection has a receive buffer
bull speed-matching service matching send rate to receiving applicationrsquos drain rate
bull App process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast
flow control
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
TCP Flow Control How it Works
(suppose TCP receiver discards out-of-order segments)
bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Receiver advertises unused buffer space by including rwnd value in segment header
bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos
buffer doesnrsquot overflow
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
rwndRcvBuffer
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control
info (eg RcvWindow)bull client connection
initiator Socket clientSocket = new
Socket(ldquohostnamerdquo port) bull server contacted by
client Socket connectionSocket =
welcomeSocketaccept()
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial
seq Step 3 client receives
SYNACK replies with ACK segment which may contain data
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer 3-81
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Principles of Congestion Control
Congestionbull Informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull Different from flow controlbull Manifestations
ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)
bull A ldquotop-10rdquo problem
Causescosts of Congestion Scenario 1
bull Two senders two receivers
bull One router infinite buffers
bull No retransmission
bull Large delays when congested
bull Maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
Causescosts of Congestion Scenario 2
bull One router finite buffers bull Sender retransmission of lost packet
finite shared output link
buffers
Host A
lin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of congestion Scenario 2
bull Always (goodput)
bull ldquoPerfectrdquo retransmission only when loss
bull Retransmission of delayed (not lost) packet makes
larger (than perfect case) for same
lin
lout
=
lin
lout
gtl
inl
out
ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of
pkt
R2
R2lin
l ou
t
b
R2
R2lin
l ou
t
a
R2
R2lin
l ou
t
c
R4
R3
Causescosts of Congestion Scenario 3
bull Four sendersbull Multihop pathsbull Timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of Congestion Scenario 3
Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Host A
Host B
lo
u
t
Approaches towards congestion control
End-end congestion control
bull No explicit feedback from network
bull Congestion inferred from end-system observed loss delay
bull Approach taken by TCP
Network-assisted congestion control
bull Routers provide feedback to end systemsndash Single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash Explicit rate sender should send at
Broadly
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Congestion Control
bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level
bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)
network not congested so increase sending rate
- lost segment assume loss due to congested network so decrease sending rate
TCP Congestion Control Bandwidth Probing
bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since
available bandwidth is changing depending on other connections in network)
ACKs being received so increase rate
X
X
XX
X loss so decrease rate
send
ing
rate
time
bull Q how fast to increasedecrease- details to follow
TCPrsquosldquosawtoothrdquobehavior
TCP Congestion Control detailsbull sender limits rate by limiting
number of unACKed bytes ldquoin pipelinerdquo
ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)
bull roughly
bull cwnd is dynamic function of perceived network congestion
rate = cwnd
RTT bytessec
LastByteSent-LastByteAcked cwnd
cwndbytes
RTT
ACK(s)
TCP Congestion Control more details
segment loss event reducing cwnd
bull timeout no response from receiverndash cut cwnd to 1
bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less
aggressively than on timeout
ACK received increase cwnd
bull slowstart phase - increase exponentially
fast (despite name) at connection start or following timeout
bull congestion avoidance - increase linearly
TCP Slow Start
bull when connection begins cwnd = 1 MSSndash example MSS = 500
bytes amp RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp
up to respectable ratebull increase rate exponentially
until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transitioning intoout of slowstart
ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2
ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to
congestion avoidance phase
slow start timeout
ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed
new ACKdupACKcount++
duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion
avoidance
TCP Congestion Avoidance
bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1
MSS per RTT ndash approach possible
congestion slower than in slowstart
ndash implementation cwnd = cwnd + MSScwnd for each ACK received
bull ACKs increase cwnd by 1 MSS per RTT additive increase
bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease
AIMD
AIMD Additive IncreaseMultiplicative Decrease
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
bull Exponential weighted moving averagebull influence of past sample decreases exponentially
fastbull typical value = 18th (or 0125)
Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety margin
bull First estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP reliable data transfer
bull TCP creates rdt service on top of IPrsquos unreliable service
bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs
bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control
TCP Sender EventsData rcvd from appbull Create segment with
seq bull seq is byte-stream
number of first data byte in segment
bull Start timer if not already running (think of timer as for oldest unACKed segment)
bull Expiration interval TimeOutInterval
Timeoutbull retransmit segment
that caused timeoutbull restart timer ACK rcvdbull If acknowledges
previously unACKed segmentsndash update what is known
to be ACKedndash start timer if there are
outstanding segments
TCP Sender
(simplified)
NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71
y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed
TCP Retransmission Scenarios
Host A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq
=92
tim
eout
ACK=120
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq
=92
tim
eout
SendBase= 100
SendBase= 120
SendBase= 120
Sendbase= 100
TCP Retransmission Scenarios (more)
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
Cumulative ACK scenario
Host B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
Fast Retransmit
bull Time-out period often relatively longndash Long delay before
resending lost packet
bull Detect lost segments via duplicate ACKsndash Sender often sends
many segments back-to-back
ndash If segment lost there will likely be many duplicate ACKs for that segment
bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
Host A
timeo
ut
Host B
time
X
resend seq X2
seq x1seq x2seq x3seq x4seq x5
ACK x1
ACK x1ACK x1ACK x1
tripleduplicate
ACKs
Fast
Retra
nsm
it
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Flow Control
bull Receive side of TCP connection has a receive buffer
bull speed-matching service matching send rate to receiving applicationrsquos drain rate
bull App process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast
flow control
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
TCP Flow Control How it Works
(suppose TCP receiver discards out-of-order segments)
bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Receiver advertises unused buffer space by including rwnd value in segment header
bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos
buffer doesnrsquot overflow
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
rwndRcvBuffer
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control
info (eg RcvWindow)bull client connection
initiator Socket clientSocket = new
Socket(ldquohostnamerdquo port) bull server contacted by
client Socket connectionSocket =
welcomeSocketaccept()
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial
seq Step 3 client receives
SYNACK replies with ACK segment which may contain data
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer 3-81
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Principles of Congestion Control
Congestionbull Informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull Different from flow controlbull Manifestations
ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)
bull A ldquotop-10rdquo problem
Causescosts of Congestion Scenario 1
bull Two senders two receivers
bull One router infinite buffers
bull No retransmission
bull Large delays when congested
bull Maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
Causescosts of Congestion Scenario 2
bull One router finite buffers bull Sender retransmission of lost packet
finite shared output link
buffers
Host A
lin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of congestion Scenario 2
bull Always (goodput)
bull ldquoPerfectrdquo retransmission only when loss
bull Retransmission of delayed (not lost) packet makes
larger (than perfect case) for same
lin
lout
=
lin
lout
gtl
inl
out
ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of
pkt
R2
R2lin
l ou
t
b
R2
R2lin
l ou
t
a
R2
R2lin
l ou
t
c
R4
R3
Causescosts of Congestion Scenario 3
bull Four sendersbull Multihop pathsbull Timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of Congestion Scenario 3
Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Host A
Host B
lo
u
t
Approaches towards congestion control
End-end congestion control
bull No explicit feedback from network
bull Congestion inferred from end-system observed loss delay
bull Approach taken by TCP
Network-assisted congestion control
bull Routers provide feedback to end systemsndash Single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash Explicit rate sender should send at
Broadly
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Congestion Control
bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level
bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)
network not congested so increase sending rate
- lost segment assume loss due to congested network so decrease sending rate
TCP Congestion Control Bandwidth Probing
bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since
available bandwidth is changing depending on other connections in network)
ACKs being received so increase rate
X
X
XX
X loss so decrease rate
send
ing
rate
time
bull Q how fast to increasedecrease- details to follow
TCPrsquosldquosawtoothrdquobehavior
TCP Congestion Control detailsbull sender limits rate by limiting
number of unACKed bytes ldquoin pipelinerdquo
ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)
bull roughly
bull cwnd is dynamic function of perceived network congestion
rate = cwnd
RTT bytessec
LastByteSent-LastByteAcked cwnd
cwndbytes
RTT
ACK(s)
TCP Congestion Control more details
segment loss event reducing cwnd
bull timeout no response from receiverndash cut cwnd to 1
bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less
aggressively than on timeout
ACK received increase cwnd
bull slowstart phase - increase exponentially
fast (despite name) at connection start or following timeout
bull congestion avoidance - increase linearly
TCP Slow Start
bull when connection begins cwnd = 1 MSSndash example MSS = 500
bytes amp RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp
up to respectable ratebull increase rate exponentially
until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transitioning intoout of slowstart
ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2
ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to
congestion avoidance phase
slow start timeout
ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed
new ACKdupACKcount++
duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion
avoidance
TCP Congestion Avoidance
bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1
MSS per RTT ndash approach possible
congestion slower than in slowstart
ndash implementation cwnd = cwnd + MSScwnd for each ACK received
bull ACKs increase cwnd by 1 MSS per RTT additive increase
bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease
AIMD
AIMD Additive IncreaseMultiplicative Decrease
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety margin
bull First estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP reliable data transfer
bull TCP creates rdt service on top of IPrsquos unreliable service
bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs
bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control
TCP Sender EventsData rcvd from appbull Create segment with
seq bull seq is byte-stream
number of first data byte in segment
bull Start timer if not already running (think of timer as for oldest unACKed segment)
bull Expiration interval TimeOutInterval
Timeoutbull retransmit segment
that caused timeoutbull restart timer ACK rcvdbull If acknowledges
previously unACKed segmentsndash update what is known
to be ACKedndash start timer if there are
outstanding segments
TCP Sender
(simplified)
NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71
y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed
TCP Retransmission Scenarios
Host A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq
=92
tim
eout
ACK=120
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq
=92
tim
eout
SendBase= 100
SendBase= 120
SendBase= 120
Sendbase= 100
TCP Retransmission Scenarios (more)
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
Cumulative ACK scenario
Host B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
Fast Retransmit
bull Time-out period often relatively longndash Long delay before
resending lost packet
bull Detect lost segments via duplicate ACKsndash Sender often sends
many segments back-to-back
ndash If segment lost there will likely be many duplicate ACKs for that segment
bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
Host A
timeo
ut
Host B
time
X
resend seq X2
seq x1seq x2seq x3seq x4seq x5
ACK x1
ACK x1ACK x1ACK x1
tripleduplicate
ACKs
Fast
Retra
nsm
it
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Flow Control
bull Receive side of TCP connection has a receive buffer
bull speed-matching service matching send rate to receiving applicationrsquos drain rate
bull App process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast
flow control
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
TCP Flow Control How it Works
(suppose TCP receiver discards out-of-order segments)
bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Receiver advertises unused buffer space by including rwnd value in segment header
bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos
buffer doesnrsquot overflow
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
rwndRcvBuffer
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control
info (eg RcvWindow)bull client connection
initiator Socket clientSocket = new
Socket(ldquohostnamerdquo port) bull server contacted by
client Socket connectionSocket =
welcomeSocketaccept()
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial
seq Step 3 client receives
SYNACK replies with ACK segment which may contain data
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer 3-81
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Principles of Congestion Control
Congestionbull Informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull Different from flow controlbull Manifestations
ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)
bull A ldquotop-10rdquo problem
Causescosts of Congestion Scenario 1
bull Two senders two receivers
bull One router infinite buffers
bull No retransmission
bull Large delays when congested
bull Maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
Causescosts of Congestion Scenario 2
bull One router finite buffers bull Sender retransmission of lost packet
finite shared output link
buffers
Host A
lin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of congestion Scenario 2
bull Always (goodput)
bull ldquoPerfectrdquo retransmission only when loss
bull Retransmission of delayed (not lost) packet makes
larger (than perfect case) for same
lin
lout
=
lin
lout
gtl
inl
out
ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of
pkt
R2
R2lin
l ou
t
b
R2
R2lin
l ou
t
a
R2
R2lin
l ou
t
c
R4
R3
Causescosts of Congestion Scenario 3
bull Four sendersbull Multihop pathsbull Timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of Congestion Scenario 3
Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Host A
Host B
lo
u
t
Approaches towards congestion control
End-end congestion control
bull No explicit feedback from network
bull Congestion inferred from end-system observed loss delay
bull Approach taken by TCP
Network-assisted congestion control
bull Routers provide feedback to end systemsndash Single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash Explicit rate sender should send at
Broadly
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Congestion Control
bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level
bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)
network not congested so increase sending rate
- lost segment assume loss due to congested network so decrease sending rate
TCP Congestion Control Bandwidth Probing
bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since
available bandwidth is changing depending on other connections in network)
ACKs being received so increase rate
X
X
XX
X loss so decrease rate
send
ing
rate
time
bull Q how fast to increasedecrease- details to follow
TCPrsquosldquosawtoothrdquobehavior
TCP Congestion Control detailsbull sender limits rate by limiting
number of unACKed bytes ldquoin pipelinerdquo
ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)
bull roughly
bull cwnd is dynamic function of perceived network congestion
rate = cwnd
RTT bytessec
LastByteSent-LastByteAcked cwnd
cwndbytes
RTT
ACK(s)
TCP Congestion Control more details
segment loss event reducing cwnd
bull timeout no response from receiverndash cut cwnd to 1
bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less
aggressively than on timeout
ACK received increase cwnd
bull slowstart phase - increase exponentially
fast (despite name) at connection start or following timeout
bull congestion avoidance - increase linearly
TCP Slow Start
bull when connection begins cwnd = 1 MSSndash example MSS = 500
bytes amp RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp
up to respectable ratebull increase rate exponentially
until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transitioning intoout of slowstart
ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2
ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to
congestion avoidance phase
slow start timeout
ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed
new ACKdupACKcount++
duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion
avoidance
TCP Congestion Avoidance
bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1
MSS per RTT ndash approach possible
congestion slower than in slowstart
ndash implementation cwnd = cwnd + MSScwnd for each ACK received
bull ACKs increase cwnd by 1 MSS per RTT additive increase
bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease
AIMD
AIMD Additive IncreaseMultiplicative Decrease
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety margin
bull First estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP reliable data transfer
bull TCP creates rdt service on top of IPrsquos unreliable service
bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs
bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control
TCP Sender EventsData rcvd from appbull Create segment with
seq bull seq is byte-stream
number of first data byte in segment
bull Start timer if not already running (think of timer as for oldest unACKed segment)
bull Expiration interval TimeOutInterval
Timeoutbull retransmit segment
that caused timeoutbull restart timer ACK rcvdbull If acknowledges
previously unACKed segmentsndash update what is known
to be ACKedndash start timer if there are
outstanding segments
TCP Sender
(simplified)
NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71
y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed
TCP Retransmission Scenarios
Host A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq
=92
tim
eout
ACK=120
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq
=92
tim
eout
SendBase= 100
SendBase= 120
SendBase= 120
Sendbase= 100
TCP Retransmission Scenarios (more)
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
Cumulative ACK scenario
Host B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
Fast Retransmit
bull Time-out period often relatively longndash Long delay before
resending lost packet
bull Detect lost segments via duplicate ACKsndash Sender often sends
many segments back-to-back
ndash If segment lost there will likely be many duplicate ACKs for that segment
bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
Host A
timeo
ut
Host B
time
X
resend seq X2
seq x1seq x2seq x3seq x4seq x5
ACK x1
ACK x1ACK x1ACK x1
tripleduplicate
ACKs
Fast
Retra
nsm
it
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Flow Control
bull Receive side of TCP connection has a receive buffer
bull speed-matching service matching send rate to receiving applicationrsquos drain rate
bull App process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast
flow control
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
TCP Flow Control How it Works
(suppose TCP receiver discards out-of-order segments)
bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Receiver advertises unused buffer space by including rwnd value in segment header
bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos
buffer doesnrsquot overflow
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
rwndRcvBuffer
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control
info (eg RcvWindow)bull client connection
initiator Socket clientSocket = new
Socket(ldquohostnamerdquo port) bull server contacted by
client Socket connectionSocket =
welcomeSocketaccept()
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial
seq Step 3 client receives
SYNACK replies with ACK segment which may contain data
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer 3-81
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Principles of Congestion Control
Congestionbull Informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull Different from flow controlbull Manifestations
ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)
bull A ldquotop-10rdquo problem
Causescosts of Congestion Scenario 1
bull Two senders two receivers
bull One router infinite buffers
bull No retransmission
bull Large delays when congested
bull Maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
Causescosts of Congestion Scenario 2
bull One router finite buffers bull Sender retransmission of lost packet
finite shared output link
buffers
Host A
lin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of congestion Scenario 2
bull Always (goodput)
bull ldquoPerfectrdquo retransmission only when loss
bull Retransmission of delayed (not lost) packet makes
larger (than perfect case) for same
lin
lout
=
lin
lout
gtl
inl
out
ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of
pkt
R2
R2lin
l ou
t
b
R2
R2lin
l ou
t
a
R2
R2lin
l ou
t
c
R4
R3
Causescosts of Congestion Scenario 3
bull Four sendersbull Multihop pathsbull Timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of Congestion Scenario 3
Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Host A
Host B
lo
u
t
Approaches towards congestion control
End-end congestion control
bull No explicit feedback from network
bull Congestion inferred from end-system observed loss delay
bull Approach taken by TCP
Network-assisted congestion control
bull Routers provide feedback to end systemsndash Single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash Explicit rate sender should send at
Broadly
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Congestion Control
bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level
bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)
network not congested so increase sending rate
- lost segment assume loss due to congested network so decrease sending rate
TCP Congestion Control Bandwidth Probing
bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since
available bandwidth is changing depending on other connections in network)
ACKs being received so increase rate
X
X
XX
X loss so decrease rate
send
ing
rate
time
bull Q how fast to increasedecrease- details to follow
TCPrsquosldquosawtoothrdquobehavior
TCP Congestion Control detailsbull sender limits rate by limiting
number of unACKed bytes ldquoin pipelinerdquo
ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)
bull roughly
bull cwnd is dynamic function of perceived network congestion
rate = cwnd
RTT bytessec
LastByteSent-LastByteAcked cwnd
cwndbytes
RTT
ACK(s)
TCP Congestion Control more details
segment loss event reducing cwnd
bull timeout no response from receiverndash cut cwnd to 1
bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less
aggressively than on timeout
ACK received increase cwnd
bull slowstart phase - increase exponentially
fast (despite name) at connection start or following timeout
bull congestion avoidance - increase linearly
TCP Slow Start
bull when connection begins cwnd = 1 MSSndash example MSS = 500
bytes amp RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp
up to respectable ratebull increase rate exponentially
until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transitioning intoout of slowstart
ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2
ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to
congestion avoidance phase
slow start timeout
ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed
new ACKdupACKcount++
duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion
avoidance
TCP Congestion Avoidance
bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1
MSS per RTT ndash approach possible
congestion slower than in slowstart
ndash implementation cwnd = cwnd + MSScwnd for each ACK received
bull ACKs increase cwnd by 1 MSS per RTT additive increase
bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease
AIMD
AIMD Additive IncreaseMultiplicative Decrease
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP reliable data transfer
bull TCP creates rdt service on top of IPrsquos unreliable service
bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs
bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control
TCP Sender EventsData rcvd from appbull Create segment with
seq bull seq is byte-stream
number of first data byte in segment
bull Start timer if not already running (think of timer as for oldest unACKed segment)
bull Expiration interval TimeOutInterval
Timeoutbull retransmit segment
that caused timeoutbull restart timer ACK rcvdbull If acknowledges
previously unACKed segmentsndash update what is known
to be ACKedndash start timer if there are
outstanding segments
TCP Sender
(simplified)
NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71
y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed
TCP Retransmission Scenarios
Host A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq
=92
tim
eout
ACK=120
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq
=92
tim
eout
SendBase= 100
SendBase= 120
SendBase= 120
Sendbase= 100
TCP Retransmission Scenarios (more)
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
Cumulative ACK scenario
Host B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
Fast Retransmit
bull Time-out period often relatively longndash Long delay before
resending lost packet
bull Detect lost segments via duplicate ACKsndash Sender often sends
many segments back-to-back
ndash If segment lost there will likely be many duplicate ACKs for that segment
bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
Host A
timeo
ut
Host B
time
X
resend seq X2
seq x1seq x2seq x3seq x4seq x5
ACK x1
ACK x1ACK x1ACK x1
tripleduplicate
ACKs
Fast
Retra
nsm
it
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Flow Control
bull Receive side of TCP connection has a receive buffer
bull speed-matching service matching send rate to receiving applicationrsquos drain rate
bull App process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast
flow control
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
TCP Flow Control How it Works
(suppose TCP receiver discards out-of-order segments)
bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Receiver advertises unused buffer space by including rwnd value in segment header
bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos
buffer doesnrsquot overflow
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
rwndRcvBuffer
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control
info (eg RcvWindow)bull client connection
initiator Socket clientSocket = new
Socket(ldquohostnamerdquo port) bull server contacted by
client Socket connectionSocket =
welcomeSocketaccept()
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial
seq Step 3 client receives
SYNACK replies with ACK segment which may contain data
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer 3-81
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Principles of Congestion Control
Congestionbull Informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull Different from flow controlbull Manifestations
ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)
bull A ldquotop-10rdquo problem
Causescosts of Congestion Scenario 1
bull Two senders two receivers
bull One router infinite buffers
bull No retransmission
bull Large delays when congested
bull Maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
Causescosts of Congestion Scenario 2
bull One router finite buffers bull Sender retransmission of lost packet
finite shared output link
buffers
Host A
lin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of congestion Scenario 2
bull Always (goodput)
bull ldquoPerfectrdquo retransmission only when loss
bull Retransmission of delayed (not lost) packet makes
larger (than perfect case) for same
lin
lout
=
lin
lout
gtl
inl
out
ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of
pkt
R2
R2lin
l ou
t
b
R2
R2lin
l ou
t
a
R2
R2lin
l ou
t
c
R4
R3
Causescosts of Congestion Scenario 3
bull Four sendersbull Multihop pathsbull Timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of Congestion Scenario 3
Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Host A
Host B
lo
u
t
Approaches towards congestion control
End-end congestion control
bull No explicit feedback from network
bull Congestion inferred from end-system observed loss delay
bull Approach taken by TCP
Network-assisted congestion control
bull Routers provide feedback to end systemsndash Single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash Explicit rate sender should send at
Broadly
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Congestion Control
bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level
bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)
network not congested so increase sending rate
- lost segment assume loss due to congested network so decrease sending rate
TCP Congestion Control Bandwidth Probing
bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since
available bandwidth is changing depending on other connections in network)
ACKs being received so increase rate
X
X
XX
X loss so decrease rate
send
ing
rate
time
bull Q how fast to increasedecrease- details to follow
TCPrsquosldquosawtoothrdquobehavior
TCP Congestion Control detailsbull sender limits rate by limiting
number of unACKed bytes ldquoin pipelinerdquo
ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)
bull roughly
bull cwnd is dynamic function of perceived network congestion
rate = cwnd
RTT bytessec
LastByteSent-LastByteAcked cwnd
cwndbytes
RTT
ACK(s)
TCP Congestion Control more details
segment loss event reducing cwnd
bull timeout no response from receiverndash cut cwnd to 1
bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less
aggressively than on timeout
ACK received increase cwnd
bull slowstart phase - increase exponentially
fast (despite name) at connection start or following timeout
bull congestion avoidance - increase linearly
TCP Slow Start
bull when connection begins cwnd = 1 MSSndash example MSS = 500
bytes amp RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp
up to respectable ratebull increase rate exponentially
until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transitioning intoout of slowstart
ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2
ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to
congestion avoidance phase
slow start timeout
ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed
new ACKdupACKcount++
duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion
avoidance
TCP Congestion Avoidance
bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1
MSS per RTT ndash approach possible
congestion slower than in slowstart
ndash implementation cwnd = cwnd + MSScwnd for each ACK received
bull ACKs increase cwnd by 1 MSS per RTT additive increase
bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease
AIMD
AIMD Additive IncreaseMultiplicative Decrease
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
TCP reliable data transfer
bull TCP creates rdt service on top of IPrsquos unreliable service
bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs
bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control
TCP Sender EventsData rcvd from appbull Create segment with
seq bull seq is byte-stream
number of first data byte in segment
bull Start timer if not already running (think of timer as for oldest unACKed segment)
bull Expiration interval TimeOutInterval
Timeoutbull retransmit segment
that caused timeoutbull restart timer ACK rcvdbull If acknowledges
previously unACKed segmentsndash update what is known
to be ACKedndash start timer if there are
outstanding segments
TCP Sender
(simplified)
NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71
y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed
TCP Retransmission Scenarios
Host A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq
=92
tim
eout
ACK=120
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq
=92
tim
eout
SendBase= 100
SendBase= 120
SendBase= 120
Sendbase= 100
TCP Retransmission Scenarios (more)
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
Cumulative ACK scenario
Host B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
Fast Retransmit
bull Time-out period often relatively longndash Long delay before
resending lost packet
bull Detect lost segments via duplicate ACKsndash Sender often sends
many segments back-to-back
ndash If segment lost there will likely be many duplicate ACKs for that segment
bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
Host A
timeo
ut
Host B
time
X
resend seq X2
seq x1seq x2seq x3seq x4seq x5
ACK x1
ACK x1ACK x1ACK x1
tripleduplicate
ACKs
Fast
Retra
nsm
it
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Flow Control
bull Receive side of TCP connection has a receive buffer
bull speed-matching service matching send rate to receiving applicationrsquos drain rate
bull App process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast
flow control
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
TCP Flow Control How it Works
(suppose TCP receiver discards out-of-order segments)
bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Receiver advertises unused buffer space by including rwnd value in segment header
bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos
buffer doesnrsquot overflow
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
rwndRcvBuffer
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control
info (eg RcvWindow)bull client connection
initiator Socket clientSocket = new
Socket(ldquohostnamerdquo port) bull server contacted by
client Socket connectionSocket =
welcomeSocketaccept()
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial
seq Step 3 client receives
SYNACK replies with ACK segment which may contain data
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer 3-81
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Principles of Congestion Control
Congestionbull Informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull Different from flow controlbull Manifestations
ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)
bull A ldquotop-10rdquo problem
Causescosts of Congestion Scenario 1
bull Two senders two receivers
bull One router infinite buffers
bull No retransmission
bull Large delays when congested
bull Maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
Causescosts of Congestion Scenario 2
bull One router finite buffers bull Sender retransmission of lost packet
finite shared output link
buffers
Host A
lin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of congestion Scenario 2
bull Always (goodput)
bull ldquoPerfectrdquo retransmission only when loss
bull Retransmission of delayed (not lost) packet makes
larger (than perfect case) for same
lin
lout
=
lin
lout
gtl
inl
out
ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of
pkt
R2
R2lin
l ou
t
b
R2
R2lin
l ou
t
a
R2
R2lin
l ou
t
c
R4
R3
Causescosts of Congestion Scenario 3
bull Four sendersbull Multihop pathsbull Timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of Congestion Scenario 3
Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Host A
Host B
lo
u
t
Approaches towards congestion control
End-end congestion control
bull No explicit feedback from network
bull Congestion inferred from end-system observed loss delay
bull Approach taken by TCP
Network-assisted congestion control
bull Routers provide feedback to end systemsndash Single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash Explicit rate sender should send at
Broadly
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Congestion Control
bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level
bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)
network not congested so increase sending rate
- lost segment assume loss due to congested network so decrease sending rate
TCP Congestion Control Bandwidth Probing
bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since
available bandwidth is changing depending on other connections in network)
ACKs being received so increase rate
X
X
XX
X loss so decrease rate
send
ing
rate
time
bull Q how fast to increasedecrease- details to follow
TCPrsquosldquosawtoothrdquobehavior
TCP Congestion Control detailsbull sender limits rate by limiting
number of unACKed bytes ldquoin pipelinerdquo
ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)
bull roughly
bull cwnd is dynamic function of perceived network congestion
rate = cwnd
RTT bytessec
LastByteSent-LastByteAcked cwnd
cwndbytes
RTT
ACK(s)
TCP Congestion Control more details
segment loss event reducing cwnd
bull timeout no response from receiverndash cut cwnd to 1
bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less
aggressively than on timeout
ACK received increase cwnd
bull slowstart phase - increase exponentially
fast (despite name) at connection start or following timeout
bull congestion avoidance - increase linearly
TCP Slow Start
bull when connection begins cwnd = 1 MSSndash example MSS = 500
bytes amp RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp
up to respectable ratebull increase rate exponentially
until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transitioning intoout of slowstart
ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2
ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to
congestion avoidance phase
slow start timeout
ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed
new ACKdupACKcount++
duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion
avoidance
TCP Congestion Avoidance
bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1
MSS per RTT ndash approach possible
congestion slower than in slowstart
ndash implementation cwnd = cwnd + MSScwnd for each ACK received
bull ACKs increase cwnd by 1 MSS per RTT additive increase
bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease
AIMD
AIMD Additive IncreaseMultiplicative Decrease
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
TCP Sender EventsData rcvd from appbull Create segment with
seq bull seq is byte-stream
number of first data byte in segment
bull Start timer if not already running (think of timer as for oldest unACKed segment)
bull Expiration interval TimeOutInterval
Timeoutbull retransmit segment
that caused timeoutbull restart timer ACK rcvdbull If acknowledges
previously unACKed segmentsndash update what is known
to be ACKedndash start timer if there are
outstanding segments
TCP Sender
(simplified)
NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71
y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed
TCP Retransmission Scenarios
Host A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq
=92
tim
eout
ACK=120
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq
=92
tim
eout
SendBase= 100
SendBase= 120
SendBase= 120
Sendbase= 100
TCP Retransmission Scenarios (more)
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
Cumulative ACK scenario
Host B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
Fast Retransmit
bull Time-out period often relatively longndash Long delay before
resending lost packet
bull Detect lost segments via duplicate ACKsndash Sender often sends
many segments back-to-back
ndash If segment lost there will likely be many duplicate ACKs for that segment
bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
Host A
timeo
ut
Host B
time
X
resend seq X2
seq x1seq x2seq x3seq x4seq x5
ACK x1
ACK x1ACK x1ACK x1
tripleduplicate
ACKs
Fast
Retra
nsm
it
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Flow Control
bull Receive side of TCP connection has a receive buffer
bull speed-matching service matching send rate to receiving applicationrsquos drain rate
bull App process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast
flow control
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
TCP Flow Control How it Works
(suppose TCP receiver discards out-of-order segments)
bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Receiver advertises unused buffer space by including rwnd value in segment header
bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos
buffer doesnrsquot overflow
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
rwndRcvBuffer
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control
info (eg RcvWindow)bull client connection
initiator Socket clientSocket = new
Socket(ldquohostnamerdquo port) bull server contacted by
client Socket connectionSocket =
welcomeSocketaccept()
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial
seq Step 3 client receives
SYNACK replies with ACK segment which may contain data
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer 3-81
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Principles of Congestion Control
Congestionbull Informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull Different from flow controlbull Manifestations
ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)
bull A ldquotop-10rdquo problem
Causescosts of Congestion Scenario 1
bull Two senders two receivers
bull One router infinite buffers
bull No retransmission
bull Large delays when congested
bull Maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
Causescosts of Congestion Scenario 2
bull One router finite buffers bull Sender retransmission of lost packet
finite shared output link
buffers
Host A
lin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of congestion Scenario 2
bull Always (goodput)
bull ldquoPerfectrdquo retransmission only when loss
bull Retransmission of delayed (not lost) packet makes
larger (than perfect case) for same
lin
lout
=
lin
lout
gtl
inl
out
ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of
pkt
R2
R2lin
l ou
t
b
R2
R2lin
l ou
t
a
R2
R2lin
l ou
t
c
R4
R3
Causescosts of Congestion Scenario 3
bull Four sendersbull Multihop pathsbull Timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of Congestion Scenario 3
Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Host A
Host B
lo
u
t
Approaches towards congestion control
End-end congestion control
bull No explicit feedback from network
bull Congestion inferred from end-system observed loss delay
bull Approach taken by TCP
Network-assisted congestion control
bull Routers provide feedback to end systemsndash Single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash Explicit rate sender should send at
Broadly
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Congestion Control
bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level
bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)
network not congested so increase sending rate
- lost segment assume loss due to congested network so decrease sending rate
TCP Congestion Control Bandwidth Probing
bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since
available bandwidth is changing depending on other connections in network)
ACKs being received so increase rate
X
X
XX
X loss so decrease rate
send
ing
rate
time
bull Q how fast to increasedecrease- details to follow
TCPrsquosldquosawtoothrdquobehavior
TCP Congestion Control detailsbull sender limits rate by limiting
number of unACKed bytes ldquoin pipelinerdquo
ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)
bull roughly
bull cwnd is dynamic function of perceived network congestion
rate = cwnd
RTT bytessec
LastByteSent-LastByteAcked cwnd
cwndbytes
RTT
ACK(s)
TCP Congestion Control more details
segment loss event reducing cwnd
bull timeout no response from receiverndash cut cwnd to 1
bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less
aggressively than on timeout
ACK received increase cwnd
bull slowstart phase - increase exponentially
fast (despite name) at connection start or following timeout
bull congestion avoidance - increase linearly
TCP Slow Start
bull when connection begins cwnd = 1 MSSndash example MSS = 500
bytes amp RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp
up to respectable ratebull increase rate exponentially
until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transitioning intoout of slowstart
ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2
ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to
congestion avoidance phase
slow start timeout
ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed
new ACKdupACKcount++
duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion
avoidance
TCP Congestion Avoidance
bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1
MSS per RTT ndash approach possible
congestion slower than in slowstart
ndash implementation cwnd = cwnd + MSScwnd for each ACK received
bull ACKs increase cwnd by 1 MSS per RTT additive increase
bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease
AIMD
AIMD Additive IncreaseMultiplicative Decrease
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
TCP Sender
(simplified)
NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71
y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed
TCP Retransmission Scenarios
Host A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq
=92
tim
eout
ACK=120
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq
=92
tim
eout
SendBase= 100
SendBase= 120
SendBase= 120
Sendbase= 100
TCP Retransmission Scenarios (more)
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
Cumulative ACK scenario
Host B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
Fast Retransmit
bull Time-out period often relatively longndash Long delay before
resending lost packet
bull Detect lost segments via duplicate ACKsndash Sender often sends
many segments back-to-back
ndash If segment lost there will likely be many duplicate ACKs for that segment
bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
Host A
timeo
ut
Host B
time
X
resend seq X2
seq x1seq x2seq x3seq x4seq x5
ACK x1
ACK x1ACK x1ACK x1
tripleduplicate
ACKs
Fast
Retra
nsm
it
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Flow Control
bull Receive side of TCP connection has a receive buffer
bull speed-matching service matching send rate to receiving applicationrsquos drain rate
bull App process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast
flow control
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
TCP Flow Control How it Works
(suppose TCP receiver discards out-of-order segments)
bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Receiver advertises unused buffer space by including rwnd value in segment header
bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos
buffer doesnrsquot overflow
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
rwndRcvBuffer
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control
info (eg RcvWindow)bull client connection
initiator Socket clientSocket = new
Socket(ldquohostnamerdquo port) bull server contacted by
client Socket connectionSocket =
welcomeSocketaccept()
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial
seq Step 3 client receives
SYNACK replies with ACK segment which may contain data
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer 3-81
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Principles of Congestion Control
Congestionbull Informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull Different from flow controlbull Manifestations
ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)
bull A ldquotop-10rdquo problem
Causescosts of Congestion Scenario 1
bull Two senders two receivers
bull One router infinite buffers
bull No retransmission
bull Large delays when congested
bull Maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
Causescosts of Congestion Scenario 2
bull One router finite buffers bull Sender retransmission of lost packet
finite shared output link
buffers
Host A
lin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of congestion Scenario 2
bull Always (goodput)
bull ldquoPerfectrdquo retransmission only when loss
bull Retransmission of delayed (not lost) packet makes
larger (than perfect case) for same
lin
lout
=
lin
lout
gtl
inl
out
ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of
pkt
R2
R2lin
l ou
t
b
R2
R2lin
l ou
t
a
R2
R2lin
l ou
t
c
R4
R3
Causescosts of Congestion Scenario 3
bull Four sendersbull Multihop pathsbull Timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of Congestion Scenario 3
Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Host A
Host B
lo
u
t
Approaches towards congestion control
End-end congestion control
bull No explicit feedback from network
bull Congestion inferred from end-system observed loss delay
bull Approach taken by TCP
Network-assisted congestion control
bull Routers provide feedback to end systemsndash Single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash Explicit rate sender should send at
Broadly
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Congestion Control
bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level
bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)
network not congested so increase sending rate
- lost segment assume loss due to congested network so decrease sending rate
TCP Congestion Control Bandwidth Probing
bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since
available bandwidth is changing depending on other connections in network)
ACKs being received so increase rate
X
X
XX
X loss so decrease rate
send
ing
rate
time
bull Q how fast to increasedecrease- details to follow
TCPrsquosldquosawtoothrdquobehavior
TCP Congestion Control detailsbull sender limits rate by limiting
number of unACKed bytes ldquoin pipelinerdquo
ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)
bull roughly
bull cwnd is dynamic function of perceived network congestion
rate = cwnd
RTT bytessec
LastByteSent-LastByteAcked cwnd
cwndbytes
RTT
ACK(s)
TCP Congestion Control more details
segment loss event reducing cwnd
bull timeout no response from receiverndash cut cwnd to 1
bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less
aggressively than on timeout
ACK received increase cwnd
bull slowstart phase - increase exponentially
fast (despite name) at connection start or following timeout
bull congestion avoidance - increase linearly
TCP Slow Start
bull when connection begins cwnd = 1 MSSndash example MSS = 500
bytes amp RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp
up to respectable ratebull increase rate exponentially
until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transitioning intoout of slowstart
ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2
ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to
congestion avoidance phase
slow start timeout
ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed
new ACKdupACKcount++
duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion
avoidance
TCP Congestion Avoidance
bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1
MSS per RTT ndash approach possible
congestion slower than in slowstart
ndash implementation cwnd = cwnd + MSScwnd for each ACK received
bull ACKs increase cwnd by 1 MSS per RTT additive increase
bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease
AIMD
AIMD Additive IncreaseMultiplicative Decrease
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
TCP Retransmission Scenarios
Host A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq
=92
tim
eout
ACK=120
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq
=92
tim
eout
SendBase= 100
SendBase= 120
SendBase= 120
Sendbase= 100
TCP Retransmission Scenarios (more)
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
Cumulative ACK scenario
Host B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
Fast Retransmit
bull Time-out period often relatively longndash Long delay before
resending lost packet
bull Detect lost segments via duplicate ACKsndash Sender often sends
many segments back-to-back
ndash If segment lost there will likely be many duplicate ACKs for that segment
bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
Host A
timeo
ut
Host B
time
X
resend seq X2
seq x1seq x2seq x3seq x4seq x5
ACK x1
ACK x1ACK x1ACK x1
tripleduplicate
ACKs
Fast
Retra
nsm
it
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Flow Control
bull Receive side of TCP connection has a receive buffer
bull speed-matching service matching send rate to receiving applicationrsquos drain rate
bull App process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast
flow control
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
TCP Flow Control How it Works
(suppose TCP receiver discards out-of-order segments)
bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Receiver advertises unused buffer space by including rwnd value in segment header
bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos
buffer doesnrsquot overflow
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
rwndRcvBuffer
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control
info (eg RcvWindow)bull client connection
initiator Socket clientSocket = new
Socket(ldquohostnamerdquo port) bull server contacted by
client Socket connectionSocket =
welcomeSocketaccept()
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial
seq Step 3 client receives
SYNACK replies with ACK segment which may contain data
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer 3-81
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Principles of Congestion Control
Congestionbull Informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull Different from flow controlbull Manifestations
ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)
bull A ldquotop-10rdquo problem
Causescosts of Congestion Scenario 1
bull Two senders two receivers
bull One router infinite buffers
bull No retransmission
bull Large delays when congested
bull Maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
Causescosts of Congestion Scenario 2
bull One router finite buffers bull Sender retransmission of lost packet
finite shared output link
buffers
Host A
lin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of congestion Scenario 2
bull Always (goodput)
bull ldquoPerfectrdquo retransmission only when loss
bull Retransmission of delayed (not lost) packet makes
larger (than perfect case) for same
lin
lout
=
lin
lout
gtl
inl
out
ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of
pkt
R2
R2lin
l ou
t
b
R2
R2lin
l ou
t
a
R2
R2lin
l ou
t
c
R4
R3
Causescosts of Congestion Scenario 3
bull Four sendersbull Multihop pathsbull Timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of Congestion Scenario 3
Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Host A
Host B
lo
u
t
Approaches towards congestion control
End-end congestion control
bull No explicit feedback from network
bull Congestion inferred from end-system observed loss delay
bull Approach taken by TCP
Network-assisted congestion control
bull Routers provide feedback to end systemsndash Single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash Explicit rate sender should send at
Broadly
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Congestion Control
bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level
bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)
network not congested so increase sending rate
- lost segment assume loss due to congested network so decrease sending rate
TCP Congestion Control Bandwidth Probing
bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since
available bandwidth is changing depending on other connections in network)
ACKs being received so increase rate
X
X
XX
X loss so decrease rate
send
ing
rate
time
bull Q how fast to increasedecrease- details to follow
TCPrsquosldquosawtoothrdquobehavior
TCP Congestion Control detailsbull sender limits rate by limiting
number of unACKed bytes ldquoin pipelinerdquo
ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)
bull roughly
bull cwnd is dynamic function of perceived network congestion
rate = cwnd
RTT bytessec
LastByteSent-LastByteAcked cwnd
cwndbytes
RTT
ACK(s)
TCP Congestion Control more details
segment loss event reducing cwnd
bull timeout no response from receiverndash cut cwnd to 1
bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less
aggressively than on timeout
ACK received increase cwnd
bull slowstart phase - increase exponentially
fast (despite name) at connection start or following timeout
bull congestion avoidance - increase linearly
TCP Slow Start
bull when connection begins cwnd = 1 MSSndash example MSS = 500
bytes amp RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp
up to respectable ratebull increase rate exponentially
until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transitioning intoout of slowstart
ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2
ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to
congestion avoidance phase
slow start timeout
ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed
new ACKdupACKcount++
duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion
avoidance
TCP Congestion Avoidance
bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1
MSS per RTT ndash approach possible
congestion slower than in slowstart
ndash implementation cwnd = cwnd + MSScwnd for each ACK received
bull ACKs increase cwnd by 1 MSS per RTT additive increase
bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease
AIMD
AIMD Additive IncreaseMultiplicative Decrease
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
TCP Retransmission Scenarios (more)
Host ASeq=92 8 bytes data
ACK=100
losstim
eout
Cumulative ACK scenario
Host B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
Fast Retransmit
bull Time-out period often relatively longndash Long delay before
resending lost packet
bull Detect lost segments via duplicate ACKsndash Sender often sends
many segments back-to-back
ndash If segment lost there will likely be many duplicate ACKs for that segment
bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
Host A
timeo
ut
Host B
time
X
resend seq X2
seq x1seq x2seq x3seq x4seq x5
ACK x1
ACK x1ACK x1ACK x1
tripleduplicate
ACKs
Fast
Retra
nsm
it
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Flow Control
bull Receive side of TCP connection has a receive buffer
bull speed-matching service matching send rate to receiving applicationrsquos drain rate
bull App process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast
flow control
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
TCP Flow Control How it Works
(suppose TCP receiver discards out-of-order segments)
bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Receiver advertises unused buffer space by including rwnd value in segment header
bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos
buffer doesnrsquot overflow
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
rwndRcvBuffer
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control
info (eg RcvWindow)bull client connection
initiator Socket clientSocket = new
Socket(ldquohostnamerdquo port) bull server contacted by
client Socket connectionSocket =
welcomeSocketaccept()
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial
seq Step 3 client receives
SYNACK replies with ACK segment which may contain data
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer 3-81
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Principles of Congestion Control
Congestionbull Informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull Different from flow controlbull Manifestations
ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)
bull A ldquotop-10rdquo problem
Causescosts of Congestion Scenario 1
bull Two senders two receivers
bull One router infinite buffers
bull No retransmission
bull Large delays when congested
bull Maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
Causescosts of Congestion Scenario 2
bull One router finite buffers bull Sender retransmission of lost packet
finite shared output link
buffers
Host A
lin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of congestion Scenario 2
bull Always (goodput)
bull ldquoPerfectrdquo retransmission only when loss
bull Retransmission of delayed (not lost) packet makes
larger (than perfect case) for same
lin
lout
=
lin
lout
gtl
inl
out
ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of
pkt
R2
R2lin
l ou
t
b
R2
R2lin
l ou
t
a
R2
R2lin
l ou
t
c
R4
R3
Causescosts of Congestion Scenario 3
bull Four sendersbull Multihop pathsbull Timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of Congestion Scenario 3
Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Host A
Host B
lo
u
t
Approaches towards congestion control
End-end congestion control
bull No explicit feedback from network
bull Congestion inferred from end-system observed loss delay
bull Approach taken by TCP
Network-assisted congestion control
bull Routers provide feedback to end systemsndash Single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash Explicit rate sender should send at
Broadly
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Congestion Control
bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level
bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)
network not congested so increase sending rate
- lost segment assume loss due to congested network so decrease sending rate
TCP Congestion Control Bandwidth Probing
bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since
available bandwidth is changing depending on other connections in network)
ACKs being received so increase rate
X
X
XX
X loss so decrease rate
send
ing
rate
time
bull Q how fast to increasedecrease- details to follow
TCPrsquosldquosawtoothrdquobehavior
TCP Congestion Control detailsbull sender limits rate by limiting
number of unACKed bytes ldquoin pipelinerdquo
ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)
bull roughly
bull cwnd is dynamic function of perceived network congestion
rate = cwnd
RTT bytessec
LastByteSent-LastByteAcked cwnd
cwndbytes
RTT
ACK(s)
TCP Congestion Control more details
segment loss event reducing cwnd
bull timeout no response from receiverndash cut cwnd to 1
bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less
aggressively than on timeout
ACK received increase cwnd
bull slowstart phase - increase exponentially
fast (despite name) at connection start or following timeout
bull congestion avoidance - increase linearly
TCP Slow Start
bull when connection begins cwnd = 1 MSSndash example MSS = 500
bytes amp RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp
up to respectable ratebull increase rate exponentially
until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transitioning intoout of slowstart
ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2
ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to
congestion avoidance phase
slow start timeout
ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed
new ACKdupACKcount++
duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion
avoidance
TCP Congestion Avoidance
bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1
MSS per RTT ndash approach possible
congestion slower than in slowstart
ndash implementation cwnd = cwnd + MSScwnd for each ACK received
bull ACKs increase cwnd by 1 MSS per RTT additive increase
bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease
AIMD
AIMD Additive IncreaseMultiplicative Decrease
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
Fast Retransmit
bull Time-out period often relatively longndash Long delay before
resending lost packet
bull Detect lost segments via duplicate ACKsndash Sender often sends
many segments back-to-back
ndash If segment lost there will likely be many duplicate ACKs for that segment
bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
Host A
timeo
ut
Host B
time
X
resend seq X2
seq x1seq x2seq x3seq x4seq x5
ACK x1
ACK x1ACK x1ACK x1
tripleduplicate
ACKs
Fast
Retra
nsm
it
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Flow Control
bull Receive side of TCP connection has a receive buffer
bull speed-matching service matching send rate to receiving applicationrsquos drain rate
bull App process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast
flow control
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
TCP Flow Control How it Works
(suppose TCP receiver discards out-of-order segments)
bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Receiver advertises unused buffer space by including rwnd value in segment header
bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos
buffer doesnrsquot overflow
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
rwndRcvBuffer
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control
info (eg RcvWindow)bull client connection
initiator Socket clientSocket = new
Socket(ldquohostnamerdquo port) bull server contacted by
client Socket connectionSocket =
welcomeSocketaccept()
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial
seq Step 3 client receives
SYNACK replies with ACK segment which may contain data
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer 3-81
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Principles of Congestion Control
Congestionbull Informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull Different from flow controlbull Manifestations
ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)
bull A ldquotop-10rdquo problem
Causescosts of Congestion Scenario 1
bull Two senders two receivers
bull One router infinite buffers
bull No retransmission
bull Large delays when congested
bull Maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
Causescosts of Congestion Scenario 2
bull One router finite buffers bull Sender retransmission of lost packet
finite shared output link
buffers
Host A
lin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of congestion Scenario 2
bull Always (goodput)
bull ldquoPerfectrdquo retransmission only when loss
bull Retransmission of delayed (not lost) packet makes
larger (than perfect case) for same
lin
lout
=
lin
lout
gtl
inl
out
ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of
pkt
R2
R2lin
l ou
t
b
R2
R2lin
l ou
t
a
R2
R2lin
l ou
t
c
R4
R3
Causescosts of Congestion Scenario 3
bull Four sendersbull Multihop pathsbull Timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of Congestion Scenario 3
Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Host A
Host B
lo
u
t
Approaches towards congestion control
End-end congestion control
bull No explicit feedback from network
bull Congestion inferred from end-system observed loss delay
bull Approach taken by TCP
Network-assisted congestion control
bull Routers provide feedback to end systemsndash Single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash Explicit rate sender should send at
Broadly
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Congestion Control
bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level
bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)
network not congested so increase sending rate
- lost segment assume loss due to congested network so decrease sending rate
TCP Congestion Control Bandwidth Probing
bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since
available bandwidth is changing depending on other connections in network)
ACKs being received so increase rate
X
X
XX
X loss so decrease rate
send
ing
rate
time
bull Q how fast to increasedecrease- details to follow
TCPrsquosldquosawtoothrdquobehavior
TCP Congestion Control detailsbull sender limits rate by limiting
number of unACKed bytes ldquoin pipelinerdquo
ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)
bull roughly
bull cwnd is dynamic function of perceived network congestion
rate = cwnd
RTT bytessec
LastByteSent-LastByteAcked cwnd
cwndbytes
RTT
ACK(s)
TCP Congestion Control more details
segment loss event reducing cwnd
bull timeout no response from receiverndash cut cwnd to 1
bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less
aggressively than on timeout
ACK received increase cwnd
bull slowstart phase - increase exponentially
fast (despite name) at connection start or following timeout
bull congestion avoidance - increase linearly
TCP Slow Start
bull when connection begins cwnd = 1 MSSndash example MSS = 500
bytes amp RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp
up to respectable ratebull increase rate exponentially
until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transitioning intoout of slowstart
ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2
ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to
congestion avoidance phase
slow start timeout
ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed
new ACKdupACKcount++
duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion
avoidance
TCP Congestion Avoidance
bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1
MSS per RTT ndash approach possible
congestion slower than in slowstart
ndash implementation cwnd = cwnd + MSScwnd for each ACK received
bull ACKs increase cwnd by 1 MSS per RTT additive increase
bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease
AIMD
AIMD Additive IncreaseMultiplicative Decrease
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
Fast Retransmit
bull Time-out period often relatively longndash Long delay before
resending lost packet
bull Detect lost segments via duplicate ACKsndash Sender often sends
many segments back-to-back
ndash If segment lost there will likely be many duplicate ACKs for that segment
bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
Host A
timeo
ut
Host B
time
X
resend seq X2
seq x1seq x2seq x3seq x4seq x5
ACK x1
ACK x1ACK x1ACK x1
tripleduplicate
ACKs
Fast
Retra
nsm
it
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Flow Control
bull Receive side of TCP connection has a receive buffer
bull speed-matching service matching send rate to receiving applicationrsquos drain rate
bull App process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast
flow control
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
TCP Flow Control How it Works
(suppose TCP receiver discards out-of-order segments)
bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Receiver advertises unused buffer space by including rwnd value in segment header
bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos
buffer doesnrsquot overflow
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
rwndRcvBuffer
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control
info (eg RcvWindow)bull client connection
initiator Socket clientSocket = new
Socket(ldquohostnamerdquo port) bull server contacted by
client Socket connectionSocket =
welcomeSocketaccept()
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial
seq Step 3 client receives
SYNACK replies with ACK segment which may contain data
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer 3-81
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Principles of Congestion Control
Congestionbull Informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull Different from flow controlbull Manifestations
ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)
bull A ldquotop-10rdquo problem
Causescosts of Congestion Scenario 1
bull Two senders two receivers
bull One router infinite buffers
bull No retransmission
bull Large delays when congested
bull Maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
Causescosts of Congestion Scenario 2
bull One router finite buffers bull Sender retransmission of lost packet
finite shared output link
buffers
Host A
lin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of congestion Scenario 2
bull Always (goodput)
bull ldquoPerfectrdquo retransmission only when loss
bull Retransmission of delayed (not lost) packet makes
larger (than perfect case) for same
lin
lout
=
lin
lout
gtl
inl
out
ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of
pkt
R2
R2lin
l ou
t
b
R2
R2lin
l ou
t
a
R2
R2lin
l ou
t
c
R4
R3
Causescosts of Congestion Scenario 3
bull Four sendersbull Multihop pathsbull Timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of Congestion Scenario 3
Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Host A
Host B
lo
u
t
Approaches towards congestion control
End-end congestion control
bull No explicit feedback from network
bull Congestion inferred from end-system observed loss delay
bull Approach taken by TCP
Network-assisted congestion control
bull Routers provide feedback to end systemsndash Single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash Explicit rate sender should send at
Broadly
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Congestion Control
bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level
bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)
network not congested so increase sending rate
- lost segment assume loss due to congested network so decrease sending rate
TCP Congestion Control Bandwidth Probing
bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since
available bandwidth is changing depending on other connections in network)
ACKs being received so increase rate
X
X
XX
X loss so decrease rate
send
ing
rate
time
bull Q how fast to increasedecrease- details to follow
TCPrsquosldquosawtoothrdquobehavior
TCP Congestion Control detailsbull sender limits rate by limiting
number of unACKed bytes ldquoin pipelinerdquo
ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)
bull roughly
bull cwnd is dynamic function of perceived network congestion
rate = cwnd
RTT bytessec
LastByteSent-LastByteAcked cwnd
cwndbytes
RTT
ACK(s)
TCP Congestion Control more details
segment loss event reducing cwnd
bull timeout no response from receiverndash cut cwnd to 1
bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less
aggressively than on timeout
ACK received increase cwnd
bull slowstart phase - increase exponentially
fast (despite name) at connection start or following timeout
bull congestion avoidance - increase linearly
TCP Slow Start
bull when connection begins cwnd = 1 MSSndash example MSS = 500
bytes amp RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp
up to respectable ratebull increase rate exponentially
until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transitioning intoout of slowstart
ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2
ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to
congestion avoidance phase
slow start timeout
ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed
new ACKdupACKcount++
duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion
avoidance
TCP Congestion Avoidance
bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1
MSS per RTT ndash approach possible
congestion slower than in slowstart
ndash implementation cwnd = cwnd + MSScwnd for each ACK received
bull ACKs increase cwnd by 1 MSS per RTT additive increase
bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease
AIMD
AIMD Additive IncreaseMultiplicative Decrease
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
Host A
timeo
ut
Host B
time
X
resend seq X2
seq x1seq x2seq x3seq x4seq x5
ACK x1
ACK x1ACK x1ACK x1
tripleduplicate
ACKs
Fast
Retra
nsm
it
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Flow Control
bull Receive side of TCP connection has a receive buffer
bull speed-matching service matching send rate to receiving applicationrsquos drain rate
bull App process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast
flow control
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
TCP Flow Control How it Works
(suppose TCP receiver discards out-of-order segments)
bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Receiver advertises unused buffer space by including rwnd value in segment header
bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos
buffer doesnrsquot overflow
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
rwndRcvBuffer
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control
info (eg RcvWindow)bull client connection
initiator Socket clientSocket = new
Socket(ldquohostnamerdquo port) bull server contacted by
client Socket connectionSocket =
welcomeSocketaccept()
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial
seq Step 3 client receives
SYNACK replies with ACK segment which may contain data
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer 3-81
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Principles of Congestion Control
Congestionbull Informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull Different from flow controlbull Manifestations
ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)
bull A ldquotop-10rdquo problem
Causescosts of Congestion Scenario 1
bull Two senders two receivers
bull One router infinite buffers
bull No retransmission
bull Large delays when congested
bull Maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
Causescosts of Congestion Scenario 2
bull One router finite buffers bull Sender retransmission of lost packet
finite shared output link
buffers
Host A
lin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of congestion Scenario 2
bull Always (goodput)
bull ldquoPerfectrdquo retransmission only when loss
bull Retransmission of delayed (not lost) packet makes
larger (than perfect case) for same
lin
lout
=
lin
lout
gtl
inl
out
ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of
pkt
R2
R2lin
l ou
t
b
R2
R2lin
l ou
t
a
R2
R2lin
l ou
t
c
R4
R3
Causescosts of Congestion Scenario 3
bull Four sendersbull Multihop pathsbull Timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of Congestion Scenario 3
Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Host A
Host B
lo
u
t
Approaches towards congestion control
End-end congestion control
bull No explicit feedback from network
bull Congestion inferred from end-system observed loss delay
bull Approach taken by TCP
Network-assisted congestion control
bull Routers provide feedback to end systemsndash Single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash Explicit rate sender should send at
Broadly
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Congestion Control
bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level
bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)
network not congested so increase sending rate
- lost segment assume loss due to congested network so decrease sending rate
TCP Congestion Control Bandwidth Probing
bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since
available bandwidth is changing depending on other connections in network)
ACKs being received so increase rate
X
X
XX
X loss so decrease rate
send
ing
rate
time
bull Q how fast to increasedecrease- details to follow
TCPrsquosldquosawtoothrdquobehavior
TCP Congestion Control detailsbull sender limits rate by limiting
number of unACKed bytes ldquoin pipelinerdquo
ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)
bull roughly
bull cwnd is dynamic function of perceived network congestion
rate = cwnd
RTT bytessec
LastByteSent-LastByteAcked cwnd
cwndbytes
RTT
ACK(s)
TCP Congestion Control more details
segment loss event reducing cwnd
bull timeout no response from receiverndash cut cwnd to 1
bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less
aggressively than on timeout
ACK received increase cwnd
bull slowstart phase - increase exponentially
fast (despite name) at connection start or following timeout
bull congestion avoidance - increase linearly
TCP Slow Start
bull when connection begins cwnd = 1 MSSndash example MSS = 500
bytes amp RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp
up to respectable ratebull increase rate exponentially
until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transitioning intoout of slowstart
ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2
ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to
congestion avoidance phase
slow start timeout
ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed
new ACKdupACKcount++
duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion
avoidance
TCP Congestion Avoidance
bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1
MSS per RTT ndash approach possible
congestion slower than in slowstart
ndash implementation cwnd = cwnd + MSScwnd for each ACK received
bull ACKs increase cwnd by 1 MSS per RTT additive increase
bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease
AIMD
AIMD Additive IncreaseMultiplicative Decrease
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Flow Control
bull Receive side of TCP connection has a receive buffer
bull speed-matching service matching send rate to receiving applicationrsquos drain rate
bull App process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast
flow control
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
TCP Flow Control How it Works
(suppose TCP receiver discards out-of-order segments)
bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Receiver advertises unused buffer space by including rwnd value in segment header
bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos
buffer doesnrsquot overflow
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
rwndRcvBuffer
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control
info (eg RcvWindow)bull client connection
initiator Socket clientSocket = new
Socket(ldquohostnamerdquo port) bull server contacted by
client Socket connectionSocket =
welcomeSocketaccept()
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial
seq Step 3 client receives
SYNACK replies with ACK segment which may contain data
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer 3-81
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Principles of Congestion Control
Congestionbull Informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull Different from flow controlbull Manifestations
ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)
bull A ldquotop-10rdquo problem
Causescosts of Congestion Scenario 1
bull Two senders two receivers
bull One router infinite buffers
bull No retransmission
bull Large delays when congested
bull Maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
Causescosts of Congestion Scenario 2
bull One router finite buffers bull Sender retransmission of lost packet
finite shared output link
buffers
Host A
lin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of congestion Scenario 2
bull Always (goodput)
bull ldquoPerfectrdquo retransmission only when loss
bull Retransmission of delayed (not lost) packet makes
larger (than perfect case) for same
lin
lout
=
lin
lout
gtl
inl
out
ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of
pkt
R2
R2lin
l ou
t
b
R2
R2lin
l ou
t
a
R2
R2lin
l ou
t
c
R4
R3
Causescosts of Congestion Scenario 3
bull Four sendersbull Multihop pathsbull Timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of Congestion Scenario 3
Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Host A
Host B
lo
u
t
Approaches towards congestion control
End-end congestion control
bull No explicit feedback from network
bull Congestion inferred from end-system observed loss delay
bull Approach taken by TCP
Network-assisted congestion control
bull Routers provide feedback to end systemsndash Single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash Explicit rate sender should send at
Broadly
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Congestion Control
bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level
bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)
network not congested so increase sending rate
- lost segment assume loss due to congested network so decrease sending rate
TCP Congestion Control Bandwidth Probing
bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since
available bandwidth is changing depending on other connections in network)
ACKs being received so increase rate
X
X
XX
X loss so decrease rate
send
ing
rate
time
bull Q how fast to increasedecrease- details to follow
TCPrsquosldquosawtoothrdquobehavior
TCP Congestion Control detailsbull sender limits rate by limiting
number of unACKed bytes ldquoin pipelinerdquo
ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)
bull roughly
bull cwnd is dynamic function of perceived network congestion
rate = cwnd
RTT bytessec
LastByteSent-LastByteAcked cwnd
cwndbytes
RTT
ACK(s)
TCP Congestion Control more details
segment loss event reducing cwnd
bull timeout no response from receiverndash cut cwnd to 1
bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less
aggressively than on timeout
ACK received increase cwnd
bull slowstart phase - increase exponentially
fast (despite name) at connection start or following timeout
bull congestion avoidance - increase linearly
TCP Slow Start
bull when connection begins cwnd = 1 MSSndash example MSS = 500
bytes amp RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp
up to respectable ratebull increase rate exponentially
until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transitioning intoout of slowstart
ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2
ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to
congestion avoidance phase
slow start timeout
ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed
new ACKdupACKcount++
duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion
avoidance
TCP Congestion Avoidance
bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1
MSS per RTT ndash approach possible
congestion slower than in slowstart
ndash implementation cwnd = cwnd + MSScwnd for each ACK received
bull ACKs increase cwnd by 1 MSS per RTT additive increase
bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease
AIMD
AIMD Additive IncreaseMultiplicative Decrease
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
TCP Flow Control
bull Receive side of TCP connection has a receive buffer
bull speed-matching service matching send rate to receiving applicationrsquos drain rate
bull App process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast
flow control
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
TCP Flow Control How it Works
(suppose TCP receiver discards out-of-order segments)
bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Receiver advertises unused buffer space by including rwnd value in segment header
bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos
buffer doesnrsquot overflow
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
rwndRcvBuffer
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control
info (eg RcvWindow)bull client connection
initiator Socket clientSocket = new
Socket(ldquohostnamerdquo port) bull server contacted by
client Socket connectionSocket =
welcomeSocketaccept()
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial
seq Step 3 client receives
SYNACK replies with ACK segment which may contain data
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer 3-81
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Principles of Congestion Control
Congestionbull Informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull Different from flow controlbull Manifestations
ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)
bull A ldquotop-10rdquo problem
Causescosts of Congestion Scenario 1
bull Two senders two receivers
bull One router infinite buffers
bull No retransmission
bull Large delays when congested
bull Maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
Causescosts of Congestion Scenario 2
bull One router finite buffers bull Sender retransmission of lost packet
finite shared output link
buffers
Host A
lin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of congestion Scenario 2
bull Always (goodput)
bull ldquoPerfectrdquo retransmission only when loss
bull Retransmission of delayed (not lost) packet makes
larger (than perfect case) for same
lin
lout
=
lin
lout
gtl
inl
out
ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of
pkt
R2
R2lin
l ou
t
b
R2
R2lin
l ou
t
a
R2
R2lin
l ou
t
c
R4
R3
Causescosts of Congestion Scenario 3
bull Four sendersbull Multihop pathsbull Timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of Congestion Scenario 3
Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Host A
Host B
lo
u
t
Approaches towards congestion control
End-end congestion control
bull No explicit feedback from network
bull Congestion inferred from end-system observed loss delay
bull Approach taken by TCP
Network-assisted congestion control
bull Routers provide feedback to end systemsndash Single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash Explicit rate sender should send at
Broadly
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Congestion Control
bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level
bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)
network not congested so increase sending rate
- lost segment assume loss due to congested network so decrease sending rate
TCP Congestion Control Bandwidth Probing
bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since
available bandwidth is changing depending on other connections in network)
ACKs being received so increase rate
X
X
XX
X loss so decrease rate
send
ing
rate
time
bull Q how fast to increasedecrease- details to follow
TCPrsquosldquosawtoothrdquobehavior
TCP Congestion Control detailsbull sender limits rate by limiting
number of unACKed bytes ldquoin pipelinerdquo
ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)
bull roughly
bull cwnd is dynamic function of perceived network congestion
rate = cwnd
RTT bytessec
LastByteSent-LastByteAcked cwnd
cwndbytes
RTT
ACK(s)
TCP Congestion Control more details
segment loss event reducing cwnd
bull timeout no response from receiverndash cut cwnd to 1
bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less
aggressively than on timeout
ACK received increase cwnd
bull slowstart phase - increase exponentially
fast (despite name) at connection start or following timeout
bull congestion avoidance - increase linearly
TCP Slow Start
bull when connection begins cwnd = 1 MSSndash example MSS = 500
bytes amp RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp
up to respectable ratebull increase rate exponentially
until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transitioning intoout of slowstart
ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2
ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to
congestion avoidance phase
slow start timeout
ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed
new ACKdupACKcount++
duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion
avoidance
TCP Congestion Avoidance
bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1
MSS per RTT ndash approach possible
congestion slower than in slowstart
ndash implementation cwnd = cwnd + MSScwnd for each ACK received
bull ACKs increase cwnd by 1 MSS per RTT additive increase
bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease
AIMD
AIMD Additive IncreaseMultiplicative Decrease
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
TCP Flow Control How it Works
(suppose TCP receiver discards out-of-order segments)
bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Receiver advertises unused buffer space by including rwnd value in segment header
bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos
buffer doesnrsquot overflow
IPdatagrams
TCP data(in buffer)
(currently)unused bufferspace
applicationprocess
rwndRcvBuffer
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control
info (eg RcvWindow)bull client connection
initiator Socket clientSocket = new
Socket(ldquohostnamerdquo port) bull server contacted by
client Socket connectionSocket =
welcomeSocketaccept()
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial
seq Step 3 client receives
SYNACK replies with ACK segment which may contain data
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer 3-81
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Principles of Congestion Control
Congestionbull Informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull Different from flow controlbull Manifestations
ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)
bull A ldquotop-10rdquo problem
Causescosts of Congestion Scenario 1
bull Two senders two receivers
bull One router infinite buffers
bull No retransmission
bull Large delays when congested
bull Maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
Causescosts of Congestion Scenario 2
bull One router finite buffers bull Sender retransmission of lost packet
finite shared output link
buffers
Host A
lin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of congestion Scenario 2
bull Always (goodput)
bull ldquoPerfectrdquo retransmission only when loss
bull Retransmission of delayed (not lost) packet makes
larger (than perfect case) for same
lin
lout
=
lin
lout
gtl
inl
out
ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of
pkt
R2
R2lin
l ou
t
b
R2
R2lin
l ou
t
a
R2
R2lin
l ou
t
c
R4
R3
Causescosts of Congestion Scenario 3
bull Four sendersbull Multihop pathsbull Timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of Congestion Scenario 3
Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Host A
Host B
lo
u
t
Approaches towards congestion control
End-end congestion control
bull No explicit feedback from network
bull Congestion inferred from end-system observed loss delay
bull Approach taken by TCP
Network-assisted congestion control
bull Routers provide feedback to end systemsndash Single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash Explicit rate sender should send at
Broadly
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Congestion Control
bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level
bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)
network not congested so increase sending rate
- lost segment assume loss due to congested network so decrease sending rate
TCP Congestion Control Bandwidth Probing
bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since
available bandwidth is changing depending on other connections in network)
ACKs being received so increase rate
X
X
XX
X loss so decrease rate
send
ing
rate
time
bull Q how fast to increasedecrease- details to follow
TCPrsquosldquosawtoothrdquobehavior
TCP Congestion Control detailsbull sender limits rate by limiting
number of unACKed bytes ldquoin pipelinerdquo
ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)
bull roughly
bull cwnd is dynamic function of perceived network congestion
rate = cwnd
RTT bytessec
LastByteSent-LastByteAcked cwnd
cwndbytes
RTT
ACK(s)
TCP Congestion Control more details
segment loss event reducing cwnd
bull timeout no response from receiverndash cut cwnd to 1
bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less
aggressively than on timeout
ACK received increase cwnd
bull slowstart phase - increase exponentially
fast (despite name) at connection start or following timeout
bull congestion avoidance - increase linearly
TCP Slow Start
bull when connection begins cwnd = 1 MSSndash example MSS = 500
bytes amp RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp
up to respectable ratebull increase rate exponentially
until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transitioning intoout of slowstart
ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2
ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to
congestion avoidance phase
slow start timeout
ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed
new ACKdupACKcount++
duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion
avoidance
TCP Congestion Avoidance
bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1
MSS per RTT ndash approach possible
congestion slower than in slowstart
ndash implementation cwnd = cwnd + MSScwnd for each ACK received
bull ACKs increase cwnd by 1 MSS per RTT additive increase
bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease
AIMD
AIMD Additive IncreaseMultiplicative Decrease
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control
info (eg RcvWindow)bull client connection
initiator Socket clientSocket = new
Socket(ldquohostnamerdquo port) bull server contacted by
client Socket connectionSocket =
welcomeSocketaccept()
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial
seq Step 3 client receives
SYNACK replies with ACK segment which may contain data
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer 3-81
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Principles of Congestion Control
Congestionbull Informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull Different from flow controlbull Manifestations
ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)
bull A ldquotop-10rdquo problem
Causescosts of Congestion Scenario 1
bull Two senders two receivers
bull One router infinite buffers
bull No retransmission
bull Large delays when congested
bull Maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
Causescosts of Congestion Scenario 2
bull One router finite buffers bull Sender retransmission of lost packet
finite shared output link
buffers
Host A
lin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of congestion Scenario 2
bull Always (goodput)
bull ldquoPerfectrdquo retransmission only when loss
bull Retransmission of delayed (not lost) packet makes
larger (than perfect case) for same
lin
lout
=
lin
lout
gtl
inl
out
ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of
pkt
R2
R2lin
l ou
t
b
R2
R2lin
l ou
t
a
R2
R2lin
l ou
t
c
R4
R3
Causescosts of Congestion Scenario 3
bull Four sendersbull Multihop pathsbull Timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of Congestion Scenario 3
Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Host A
Host B
lo
u
t
Approaches towards congestion control
End-end congestion control
bull No explicit feedback from network
bull Congestion inferred from end-system observed loss delay
bull Approach taken by TCP
Network-assisted congestion control
bull Routers provide feedback to end systemsndash Single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash Explicit rate sender should send at
Broadly
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Congestion Control
bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level
bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)
network not congested so increase sending rate
- lost segment assume loss due to congested network so decrease sending rate
TCP Congestion Control Bandwidth Probing
bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since
available bandwidth is changing depending on other connections in network)
ACKs being received so increase rate
X
X
XX
X loss so decrease rate
send
ing
rate
time
bull Q how fast to increasedecrease- details to follow
TCPrsquosldquosawtoothrdquobehavior
TCP Congestion Control detailsbull sender limits rate by limiting
number of unACKed bytes ldquoin pipelinerdquo
ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)
bull roughly
bull cwnd is dynamic function of perceived network congestion
rate = cwnd
RTT bytessec
LastByteSent-LastByteAcked cwnd
cwndbytes
RTT
ACK(s)
TCP Congestion Control more details
segment loss event reducing cwnd
bull timeout no response from receiverndash cut cwnd to 1
bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less
aggressively than on timeout
ACK received increase cwnd
bull slowstart phase - increase exponentially
fast (despite name) at connection start or following timeout
bull congestion avoidance - increase linearly
TCP Slow Start
bull when connection begins cwnd = 1 MSSndash example MSS = 500
bytes amp RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp
up to respectable ratebull increase rate exponentially
until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transitioning intoout of slowstart
ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2
ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to
congestion avoidance phase
slow start timeout
ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed
new ACKdupACKcount++
duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion
avoidance
TCP Congestion Avoidance
bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1
MSS per RTT ndash approach possible
congestion slower than in slowstart
ndash implementation cwnd = cwnd + MSScwnd for each ACK received
bull ACKs increase cwnd by 1 MSS per RTT additive increase
bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease
AIMD
AIMD Additive IncreaseMultiplicative Decrease
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control
info (eg RcvWindow)bull client connection
initiator Socket clientSocket = new
Socket(ldquohostnamerdquo port) bull server contacted by
client Socket connectionSocket =
welcomeSocketaccept()
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial
seq Step 3 client receives
SYNACK replies with ACK segment which may contain data
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer 3-81
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Principles of Congestion Control
Congestionbull Informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull Different from flow controlbull Manifestations
ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)
bull A ldquotop-10rdquo problem
Causescosts of Congestion Scenario 1
bull Two senders two receivers
bull One router infinite buffers
bull No retransmission
bull Large delays when congested
bull Maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
Causescosts of Congestion Scenario 2
bull One router finite buffers bull Sender retransmission of lost packet
finite shared output link
buffers
Host A
lin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of congestion Scenario 2
bull Always (goodput)
bull ldquoPerfectrdquo retransmission only when loss
bull Retransmission of delayed (not lost) packet makes
larger (than perfect case) for same
lin
lout
=
lin
lout
gtl
inl
out
ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of
pkt
R2
R2lin
l ou
t
b
R2
R2lin
l ou
t
a
R2
R2lin
l ou
t
c
R4
R3
Causescosts of Congestion Scenario 3
bull Four sendersbull Multihop pathsbull Timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of Congestion Scenario 3
Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Host A
Host B
lo
u
t
Approaches towards congestion control
End-end congestion control
bull No explicit feedback from network
bull Congestion inferred from end-system observed loss delay
bull Approach taken by TCP
Network-assisted congestion control
bull Routers provide feedback to end systemsndash Single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash Explicit rate sender should send at
Broadly
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Congestion Control
bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level
bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)
network not congested so increase sending rate
- lost segment assume loss due to congested network so decrease sending rate
TCP Congestion Control Bandwidth Probing
bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since
available bandwidth is changing depending on other connections in network)
ACKs being received so increase rate
X
X
XX
X loss so decrease rate
send
ing
rate
time
bull Q how fast to increasedecrease- details to follow
TCPrsquosldquosawtoothrdquobehavior
TCP Congestion Control detailsbull sender limits rate by limiting
number of unACKed bytes ldquoin pipelinerdquo
ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)
bull roughly
bull cwnd is dynamic function of perceived network congestion
rate = cwnd
RTT bytessec
LastByteSent-LastByteAcked cwnd
cwndbytes
RTT
ACK(s)
TCP Congestion Control more details
segment loss event reducing cwnd
bull timeout no response from receiverndash cut cwnd to 1
bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less
aggressively than on timeout
ACK received increase cwnd
bull slowstart phase - increase exponentially
fast (despite name) at connection start or following timeout
bull congestion avoidance - increase linearly
TCP Slow Start
bull when connection begins cwnd = 1 MSSndash example MSS = 500
bytes amp RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp
up to respectable ratebull increase rate exponentially
until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transitioning intoout of slowstart
ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2
ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to
congestion avoidance phase
slow start timeout
ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed
new ACKdupACKcount++
duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion
avoidance
TCP Congestion Avoidance
bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1
MSS per RTT ndash approach possible
congestion slower than in slowstart
ndash implementation cwnd = cwnd + MSScwnd for each ACK received
bull ACKs increase cwnd by 1 MSS per RTT additive increase
bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease
AIMD
AIMD Additive IncreaseMultiplicative Decrease
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer 3-81
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Principles of Congestion Control
Congestionbull Informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull Different from flow controlbull Manifestations
ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)
bull A ldquotop-10rdquo problem
Causescosts of Congestion Scenario 1
bull Two senders two receivers
bull One router infinite buffers
bull No retransmission
bull Large delays when congested
bull Maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
Causescosts of Congestion Scenario 2
bull One router finite buffers bull Sender retransmission of lost packet
finite shared output link
buffers
Host A
lin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of congestion Scenario 2
bull Always (goodput)
bull ldquoPerfectrdquo retransmission only when loss
bull Retransmission of delayed (not lost) packet makes
larger (than perfect case) for same
lin
lout
=
lin
lout
gtl
inl
out
ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of
pkt
R2
R2lin
l ou
t
b
R2
R2lin
l ou
t
a
R2
R2lin
l ou
t
c
R4
R3
Causescosts of Congestion Scenario 3
bull Four sendersbull Multihop pathsbull Timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of Congestion Scenario 3
Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Host A
Host B
lo
u
t
Approaches towards congestion control
End-end congestion control
bull No explicit feedback from network
bull Congestion inferred from end-system observed loss delay
bull Approach taken by TCP
Network-assisted congestion control
bull Routers provide feedback to end systemsndash Single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash Explicit rate sender should send at
Broadly
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Congestion Control
bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level
bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)
network not congested so increase sending rate
- lost segment assume loss due to congested network so decrease sending rate
TCP Congestion Control Bandwidth Probing
bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since
available bandwidth is changing depending on other connections in network)
ACKs being received so increase rate
X
X
XX
X loss so decrease rate
send
ing
rate
time
bull Q how fast to increasedecrease- details to follow
TCPrsquosldquosawtoothrdquobehavior
TCP Congestion Control detailsbull sender limits rate by limiting
number of unACKed bytes ldquoin pipelinerdquo
ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)
bull roughly
bull cwnd is dynamic function of perceived network congestion
rate = cwnd
RTT bytessec
LastByteSent-LastByteAcked cwnd
cwndbytes
RTT
ACK(s)
TCP Congestion Control more details
segment loss event reducing cwnd
bull timeout no response from receiverndash cut cwnd to 1
bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less
aggressively than on timeout
ACK received increase cwnd
bull slowstart phase - increase exponentially
fast (despite name) at connection start or following timeout
bull congestion avoidance - increase linearly
TCP Slow Start
bull when connection begins cwnd = 1 MSSndash example MSS = 500
bytes amp RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp
up to respectable ratebull increase rate exponentially
until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transitioning intoout of slowstart
ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2
ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to
congestion avoidance phase
slow start timeout
ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed
new ACKdupACKcount++
duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion
avoidance
TCP Congestion Avoidance
bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1
MSS per RTT ndash approach possible
congestion slower than in slowstart
ndash implementation cwnd = cwnd + MSScwnd for each ACK received
bull ACKs increase cwnd by 1 MSS per RTT additive increase
bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease
AIMD
AIMD Additive IncreaseMultiplicative Decrease
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer 3-81
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Principles of Congestion Control
Congestionbull Informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull Different from flow controlbull Manifestations
ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)
bull A ldquotop-10rdquo problem
Causescosts of Congestion Scenario 1
bull Two senders two receivers
bull One router infinite buffers
bull No retransmission
bull Large delays when congested
bull Maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
Causescosts of Congestion Scenario 2
bull One router finite buffers bull Sender retransmission of lost packet
finite shared output link
buffers
Host A
lin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of congestion Scenario 2
bull Always (goodput)
bull ldquoPerfectrdquo retransmission only when loss
bull Retransmission of delayed (not lost) packet makes
larger (than perfect case) for same
lin
lout
=
lin
lout
gtl
inl
out
ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of
pkt
R2
R2lin
l ou
t
b
R2
R2lin
l ou
t
a
R2
R2lin
l ou
t
c
R4
R3
Causescosts of Congestion Scenario 3
bull Four sendersbull Multihop pathsbull Timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of Congestion Scenario 3
Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Host A
Host B
lo
u
t
Approaches towards congestion control
End-end congestion control
bull No explicit feedback from network
bull Congestion inferred from end-system observed loss delay
bull Approach taken by TCP
Network-assisted congestion control
bull Routers provide feedback to end systemsndash Single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash Explicit rate sender should send at
Broadly
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Congestion Control
bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level
bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)
network not congested so increase sending rate
- lost segment assume loss due to congested network so decrease sending rate
TCP Congestion Control Bandwidth Probing
bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since
available bandwidth is changing depending on other connections in network)
ACKs being received so increase rate
X
X
XX
X loss so decrease rate
send
ing
rate
time
bull Q how fast to increasedecrease- details to follow
TCPrsquosldquosawtoothrdquobehavior
TCP Congestion Control detailsbull sender limits rate by limiting
number of unACKed bytes ldquoin pipelinerdquo
ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)
bull roughly
bull cwnd is dynamic function of perceived network congestion
rate = cwnd
RTT bytessec
LastByteSent-LastByteAcked cwnd
cwndbytes
RTT
ACK(s)
TCP Congestion Control more details
segment loss event reducing cwnd
bull timeout no response from receiverndash cut cwnd to 1
bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less
aggressively than on timeout
ACK received increase cwnd
bull slowstart phase - increase exponentially
fast (despite name) at connection start or following timeout
bull congestion avoidance - increase linearly
TCP Slow Start
bull when connection begins cwnd = 1 MSSndash example MSS = 500
bytes amp RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp
up to respectable ratebull increase rate exponentially
until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transitioning intoout of slowstart
ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2
ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to
congestion avoidance phase
slow start timeout
ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed
new ACKdupACKcount++
duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion
avoidance
TCP Congestion Avoidance
bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1
MSS per RTT ndash approach possible
congestion slower than in slowstart
ndash implementation cwnd = cwnd + MSScwnd for each ACK received
bull ACKs increase cwnd by 1 MSS per RTT additive increase
bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease
AIMD
AIMD Additive IncreaseMultiplicative Decrease
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
Transport Layer 3-81
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Principles of Congestion Control
Congestionbull Informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull Different from flow controlbull Manifestations
ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)
bull A ldquotop-10rdquo problem
Causescosts of Congestion Scenario 1
bull Two senders two receivers
bull One router infinite buffers
bull No retransmission
bull Large delays when congested
bull Maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
Causescosts of Congestion Scenario 2
bull One router finite buffers bull Sender retransmission of lost packet
finite shared output link
buffers
Host A
lin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of congestion Scenario 2
bull Always (goodput)
bull ldquoPerfectrdquo retransmission only when loss
bull Retransmission of delayed (not lost) packet makes
larger (than perfect case) for same
lin
lout
=
lin
lout
gtl
inl
out
ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of
pkt
R2
R2lin
l ou
t
b
R2
R2lin
l ou
t
a
R2
R2lin
l ou
t
c
R4
R3
Causescosts of Congestion Scenario 3
bull Four sendersbull Multihop pathsbull Timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of Congestion Scenario 3
Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Host A
Host B
lo
u
t
Approaches towards congestion control
End-end congestion control
bull No explicit feedback from network
bull Congestion inferred from end-system observed loss delay
bull Approach taken by TCP
Network-assisted congestion control
bull Routers provide feedback to end systemsndash Single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash Explicit rate sender should send at
Broadly
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Congestion Control
bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level
bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)
network not congested so increase sending rate
- lost segment assume loss due to congested network so decrease sending rate
TCP Congestion Control Bandwidth Probing
bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since
available bandwidth is changing depending on other connections in network)
ACKs being received so increase rate
X
X
XX
X loss so decrease rate
send
ing
rate
time
bull Q how fast to increasedecrease- details to follow
TCPrsquosldquosawtoothrdquobehavior
TCP Congestion Control detailsbull sender limits rate by limiting
number of unACKed bytes ldquoin pipelinerdquo
ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)
bull roughly
bull cwnd is dynamic function of perceived network congestion
rate = cwnd
RTT bytessec
LastByteSent-LastByteAcked cwnd
cwndbytes
RTT
ACK(s)
TCP Congestion Control more details
segment loss event reducing cwnd
bull timeout no response from receiverndash cut cwnd to 1
bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less
aggressively than on timeout
ACK received increase cwnd
bull slowstart phase - increase exponentially
fast (despite name) at connection start or following timeout
bull congestion avoidance - increase linearly
TCP Slow Start
bull when connection begins cwnd = 1 MSSndash example MSS = 500
bytes amp RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp
up to respectable ratebull increase rate exponentially
until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transitioning intoout of slowstart
ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2
ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to
congestion avoidance phase
slow start timeout
ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed
new ACKdupACKcount++
duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion
avoidance
TCP Congestion Avoidance
bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1
MSS per RTT ndash approach possible
congestion slower than in slowstart
ndash implementation cwnd = cwnd + MSScwnd for each ACK received
bull ACKs increase cwnd by 1 MSS per RTT additive increase
bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease
AIMD
AIMD Additive IncreaseMultiplicative Decrease
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
Principles of Congestion Control
Congestionbull Informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull Different from flow controlbull Manifestations
ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)
bull A ldquotop-10rdquo problem
Causescosts of Congestion Scenario 1
bull Two senders two receivers
bull One router infinite buffers
bull No retransmission
bull Large delays when congested
bull Maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
Causescosts of Congestion Scenario 2
bull One router finite buffers bull Sender retransmission of lost packet
finite shared output link
buffers
Host A
lin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of congestion Scenario 2
bull Always (goodput)
bull ldquoPerfectrdquo retransmission only when loss
bull Retransmission of delayed (not lost) packet makes
larger (than perfect case) for same
lin
lout
=
lin
lout
gtl
inl
out
ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of
pkt
R2
R2lin
l ou
t
b
R2
R2lin
l ou
t
a
R2
R2lin
l ou
t
c
R4
R3
Causescosts of Congestion Scenario 3
bull Four sendersbull Multihop pathsbull Timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of Congestion Scenario 3
Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Host A
Host B
lo
u
t
Approaches towards congestion control
End-end congestion control
bull No explicit feedback from network
bull Congestion inferred from end-system observed loss delay
bull Approach taken by TCP
Network-assisted congestion control
bull Routers provide feedback to end systemsndash Single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash Explicit rate sender should send at
Broadly
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Congestion Control
bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level
bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)
network not congested so increase sending rate
- lost segment assume loss due to congested network so decrease sending rate
TCP Congestion Control Bandwidth Probing
bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since
available bandwidth is changing depending on other connections in network)
ACKs being received so increase rate
X
X
XX
X loss so decrease rate
send
ing
rate
time
bull Q how fast to increasedecrease- details to follow
TCPrsquosldquosawtoothrdquobehavior
TCP Congestion Control detailsbull sender limits rate by limiting
number of unACKed bytes ldquoin pipelinerdquo
ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)
bull roughly
bull cwnd is dynamic function of perceived network congestion
rate = cwnd
RTT bytessec
LastByteSent-LastByteAcked cwnd
cwndbytes
RTT
ACK(s)
TCP Congestion Control more details
segment loss event reducing cwnd
bull timeout no response from receiverndash cut cwnd to 1
bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less
aggressively than on timeout
ACK received increase cwnd
bull slowstart phase - increase exponentially
fast (despite name) at connection start or following timeout
bull congestion avoidance - increase linearly
TCP Slow Start
bull when connection begins cwnd = 1 MSSndash example MSS = 500
bytes amp RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp
up to respectable ratebull increase rate exponentially
until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transitioning intoout of slowstart
ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2
ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to
congestion avoidance phase
slow start timeout
ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed
new ACKdupACKcount++
duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion
avoidance
TCP Congestion Avoidance
bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1
MSS per RTT ndash approach possible
congestion slower than in slowstart
ndash implementation cwnd = cwnd + MSScwnd for each ACK received
bull ACKs increase cwnd by 1 MSS per RTT additive increase
bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease
AIMD
AIMD Additive IncreaseMultiplicative Decrease
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
Principles of Congestion Control
Congestionbull Informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull Different from flow controlbull Manifestations
ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)
bull A ldquotop-10rdquo problem
Causescosts of Congestion Scenario 1
bull Two senders two receivers
bull One router infinite buffers
bull No retransmission
bull Large delays when congested
bull Maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
Causescosts of Congestion Scenario 2
bull One router finite buffers bull Sender retransmission of lost packet
finite shared output link
buffers
Host A
lin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of congestion Scenario 2
bull Always (goodput)
bull ldquoPerfectrdquo retransmission only when loss
bull Retransmission of delayed (not lost) packet makes
larger (than perfect case) for same
lin
lout
=
lin
lout
gtl
inl
out
ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of
pkt
R2
R2lin
l ou
t
b
R2
R2lin
l ou
t
a
R2
R2lin
l ou
t
c
R4
R3
Causescosts of Congestion Scenario 3
bull Four sendersbull Multihop pathsbull Timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of Congestion Scenario 3
Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Host A
Host B
lo
u
t
Approaches towards congestion control
End-end congestion control
bull No explicit feedback from network
bull Congestion inferred from end-system observed loss delay
bull Approach taken by TCP
Network-assisted congestion control
bull Routers provide feedback to end systemsndash Single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash Explicit rate sender should send at
Broadly
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Congestion Control
bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level
bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)
network not congested so increase sending rate
- lost segment assume loss due to congested network so decrease sending rate
TCP Congestion Control Bandwidth Probing
bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since
available bandwidth is changing depending on other connections in network)
ACKs being received so increase rate
X
X
XX
X loss so decrease rate
send
ing
rate
time
bull Q how fast to increasedecrease- details to follow
TCPrsquosldquosawtoothrdquobehavior
TCP Congestion Control detailsbull sender limits rate by limiting
number of unACKed bytes ldquoin pipelinerdquo
ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)
bull roughly
bull cwnd is dynamic function of perceived network congestion
rate = cwnd
RTT bytessec
LastByteSent-LastByteAcked cwnd
cwndbytes
RTT
ACK(s)
TCP Congestion Control more details
segment loss event reducing cwnd
bull timeout no response from receiverndash cut cwnd to 1
bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less
aggressively than on timeout
ACK received increase cwnd
bull slowstart phase - increase exponentially
fast (despite name) at connection start or following timeout
bull congestion avoidance - increase linearly
TCP Slow Start
bull when connection begins cwnd = 1 MSSndash example MSS = 500
bytes amp RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp
up to respectable ratebull increase rate exponentially
until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transitioning intoout of slowstart
ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2
ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to
congestion avoidance phase
slow start timeout
ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed
new ACKdupACKcount++
duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion
avoidance
TCP Congestion Avoidance
bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1
MSS per RTT ndash approach possible
congestion slower than in slowstart
ndash implementation cwnd = cwnd + MSScwnd for each ACK received
bull ACKs increase cwnd by 1 MSS per RTT additive increase
bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease
AIMD
AIMD Additive IncreaseMultiplicative Decrease
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
Causescosts of Congestion Scenario 1
bull Two senders two receivers
bull One router infinite buffers
bull No retransmission
bull Large delays when congested
bull Maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
Causescosts of Congestion Scenario 2
bull One router finite buffers bull Sender retransmission of lost packet
finite shared output link
buffers
Host A
lin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of congestion Scenario 2
bull Always (goodput)
bull ldquoPerfectrdquo retransmission only when loss
bull Retransmission of delayed (not lost) packet makes
larger (than perfect case) for same
lin
lout
=
lin
lout
gtl
inl
out
ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of
pkt
R2
R2lin
l ou
t
b
R2
R2lin
l ou
t
a
R2
R2lin
l ou
t
c
R4
R3
Causescosts of Congestion Scenario 3
bull Four sendersbull Multihop pathsbull Timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of Congestion Scenario 3
Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Host A
Host B
lo
u
t
Approaches towards congestion control
End-end congestion control
bull No explicit feedback from network
bull Congestion inferred from end-system observed loss delay
bull Approach taken by TCP
Network-assisted congestion control
bull Routers provide feedback to end systemsndash Single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash Explicit rate sender should send at
Broadly
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Congestion Control
bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level
bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)
network not congested so increase sending rate
- lost segment assume loss due to congested network so decrease sending rate
TCP Congestion Control Bandwidth Probing
bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since
available bandwidth is changing depending on other connections in network)
ACKs being received so increase rate
X
X
XX
X loss so decrease rate
send
ing
rate
time
bull Q how fast to increasedecrease- details to follow
TCPrsquosldquosawtoothrdquobehavior
TCP Congestion Control detailsbull sender limits rate by limiting
number of unACKed bytes ldquoin pipelinerdquo
ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)
bull roughly
bull cwnd is dynamic function of perceived network congestion
rate = cwnd
RTT bytessec
LastByteSent-LastByteAcked cwnd
cwndbytes
RTT
ACK(s)
TCP Congestion Control more details
segment loss event reducing cwnd
bull timeout no response from receiverndash cut cwnd to 1
bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less
aggressively than on timeout
ACK received increase cwnd
bull slowstart phase - increase exponentially
fast (despite name) at connection start or following timeout
bull congestion avoidance - increase linearly
TCP Slow Start
bull when connection begins cwnd = 1 MSSndash example MSS = 500
bytes amp RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp
up to respectable ratebull increase rate exponentially
until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transitioning intoout of slowstart
ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2
ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to
congestion avoidance phase
slow start timeout
ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed
new ACKdupACKcount++
duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion
avoidance
TCP Congestion Avoidance
bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1
MSS per RTT ndash approach possible
congestion slower than in slowstart
ndash implementation cwnd = cwnd + MSScwnd for each ACK received
bull ACKs increase cwnd by 1 MSS per RTT additive increase
bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease
AIMD
AIMD Additive IncreaseMultiplicative Decrease
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
Causescosts of Congestion Scenario 2
bull One router finite buffers bull Sender retransmission of lost packet
finite shared output link
buffers
Host A
lin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of congestion Scenario 2
bull Always (goodput)
bull ldquoPerfectrdquo retransmission only when loss
bull Retransmission of delayed (not lost) packet makes
larger (than perfect case) for same
lin
lout
=
lin
lout
gtl
inl
out
ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of
pkt
R2
R2lin
l ou
t
b
R2
R2lin
l ou
t
a
R2
R2lin
l ou
t
c
R4
R3
Causescosts of Congestion Scenario 3
bull Four sendersbull Multihop pathsbull Timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of Congestion Scenario 3
Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Host A
Host B
lo
u
t
Approaches towards congestion control
End-end congestion control
bull No explicit feedback from network
bull Congestion inferred from end-system observed loss delay
bull Approach taken by TCP
Network-assisted congestion control
bull Routers provide feedback to end systemsndash Single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash Explicit rate sender should send at
Broadly
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Congestion Control
bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level
bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)
network not congested so increase sending rate
- lost segment assume loss due to congested network so decrease sending rate
TCP Congestion Control Bandwidth Probing
bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since
available bandwidth is changing depending on other connections in network)
ACKs being received so increase rate
X
X
XX
X loss so decrease rate
send
ing
rate
time
bull Q how fast to increasedecrease- details to follow
TCPrsquosldquosawtoothrdquobehavior
TCP Congestion Control detailsbull sender limits rate by limiting
number of unACKed bytes ldquoin pipelinerdquo
ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)
bull roughly
bull cwnd is dynamic function of perceived network congestion
rate = cwnd
RTT bytessec
LastByteSent-LastByteAcked cwnd
cwndbytes
RTT
ACK(s)
TCP Congestion Control more details
segment loss event reducing cwnd
bull timeout no response from receiverndash cut cwnd to 1
bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less
aggressively than on timeout
ACK received increase cwnd
bull slowstart phase - increase exponentially
fast (despite name) at connection start or following timeout
bull congestion avoidance - increase linearly
TCP Slow Start
bull when connection begins cwnd = 1 MSSndash example MSS = 500
bytes amp RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp
up to respectable ratebull increase rate exponentially
until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transitioning intoout of slowstart
ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2
ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to
congestion avoidance phase
slow start timeout
ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed
new ACKdupACKcount++
duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion
avoidance
TCP Congestion Avoidance
bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1
MSS per RTT ndash approach possible
congestion slower than in slowstart
ndash implementation cwnd = cwnd + MSScwnd for each ACK received
bull ACKs increase cwnd by 1 MSS per RTT additive increase
bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease
AIMD
AIMD Additive IncreaseMultiplicative Decrease
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
Causescosts of congestion Scenario 2
bull Always (goodput)
bull ldquoPerfectrdquo retransmission only when loss
bull Retransmission of delayed (not lost) packet makes
larger (than perfect case) for same
lin
lout
=
lin
lout
gtl
inl
out
ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of
pkt
R2
R2lin
l ou
t
b
R2
R2lin
l ou
t
a
R2
R2lin
l ou
t
c
R4
R3
Causescosts of Congestion Scenario 3
bull Four sendersbull Multihop pathsbull Timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of Congestion Scenario 3
Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Host A
Host B
lo
u
t
Approaches towards congestion control
End-end congestion control
bull No explicit feedback from network
bull Congestion inferred from end-system observed loss delay
bull Approach taken by TCP
Network-assisted congestion control
bull Routers provide feedback to end systemsndash Single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash Explicit rate sender should send at
Broadly
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Congestion Control
bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level
bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)
network not congested so increase sending rate
- lost segment assume loss due to congested network so decrease sending rate
TCP Congestion Control Bandwidth Probing
bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since
available bandwidth is changing depending on other connections in network)
ACKs being received so increase rate
X
X
XX
X loss so decrease rate
send
ing
rate
time
bull Q how fast to increasedecrease- details to follow
TCPrsquosldquosawtoothrdquobehavior
TCP Congestion Control detailsbull sender limits rate by limiting
number of unACKed bytes ldquoin pipelinerdquo
ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)
bull roughly
bull cwnd is dynamic function of perceived network congestion
rate = cwnd
RTT bytessec
LastByteSent-LastByteAcked cwnd
cwndbytes
RTT
ACK(s)
TCP Congestion Control more details
segment loss event reducing cwnd
bull timeout no response from receiverndash cut cwnd to 1
bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less
aggressively than on timeout
ACK received increase cwnd
bull slowstart phase - increase exponentially
fast (despite name) at connection start or following timeout
bull congestion avoidance - increase linearly
TCP Slow Start
bull when connection begins cwnd = 1 MSSndash example MSS = 500
bytes amp RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp
up to respectable ratebull increase rate exponentially
until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transitioning intoout of slowstart
ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2
ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to
congestion avoidance phase
slow start timeout
ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed
new ACKdupACKcount++
duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion
avoidance
TCP Congestion Avoidance
bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1
MSS per RTT ndash approach possible
congestion slower than in slowstart
ndash implementation cwnd = cwnd + MSScwnd for each ACK received
bull ACKs increase cwnd by 1 MSS per RTT additive increase
bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease
AIMD
AIMD Additive IncreaseMultiplicative Decrease
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
Causescosts of Congestion Scenario 3
bull Four sendersbull Multihop pathsbull Timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
Causescosts of Congestion Scenario 3
Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Host A
Host B
lo
u
t
Approaches towards congestion control
End-end congestion control
bull No explicit feedback from network
bull Congestion inferred from end-system observed loss delay
bull Approach taken by TCP
Network-assisted congestion control
bull Routers provide feedback to end systemsndash Single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash Explicit rate sender should send at
Broadly
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Congestion Control
bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level
bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)
network not congested so increase sending rate
- lost segment assume loss due to congested network so decrease sending rate
TCP Congestion Control Bandwidth Probing
bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since
available bandwidth is changing depending on other connections in network)
ACKs being received so increase rate
X
X
XX
X loss so decrease rate
send
ing
rate
time
bull Q how fast to increasedecrease- details to follow
TCPrsquosldquosawtoothrdquobehavior
TCP Congestion Control detailsbull sender limits rate by limiting
number of unACKed bytes ldquoin pipelinerdquo
ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)
bull roughly
bull cwnd is dynamic function of perceived network congestion
rate = cwnd
RTT bytessec
LastByteSent-LastByteAcked cwnd
cwndbytes
RTT
ACK(s)
TCP Congestion Control more details
segment loss event reducing cwnd
bull timeout no response from receiverndash cut cwnd to 1
bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less
aggressively than on timeout
ACK received increase cwnd
bull slowstart phase - increase exponentially
fast (despite name) at connection start or following timeout
bull congestion avoidance - increase linearly
TCP Slow Start
bull when connection begins cwnd = 1 MSSndash example MSS = 500
bytes amp RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp
up to respectable ratebull increase rate exponentially
until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transitioning intoout of slowstart
ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2
ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to
congestion avoidance phase
slow start timeout
ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed
new ACKdupACKcount++
duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion
avoidance
TCP Congestion Avoidance
bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1
MSS per RTT ndash approach possible
congestion slower than in slowstart
ndash implementation cwnd = cwnd + MSScwnd for each ACK received
bull ACKs increase cwnd by 1 MSS per RTT additive increase
bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease
AIMD
AIMD Additive IncreaseMultiplicative Decrease
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
Causescosts of Congestion Scenario 3
Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Host A
Host B
lo
u
t
Approaches towards congestion control
End-end congestion control
bull No explicit feedback from network
bull Congestion inferred from end-system observed loss delay
bull Approach taken by TCP
Network-assisted congestion control
bull Routers provide feedback to end systemsndash Single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash Explicit rate sender should send at
Broadly
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Congestion Control
bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level
bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)
network not congested so increase sending rate
- lost segment assume loss due to congested network so decrease sending rate
TCP Congestion Control Bandwidth Probing
bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since
available bandwidth is changing depending on other connections in network)
ACKs being received so increase rate
X
X
XX
X loss so decrease rate
send
ing
rate
time
bull Q how fast to increasedecrease- details to follow
TCPrsquosldquosawtoothrdquobehavior
TCP Congestion Control detailsbull sender limits rate by limiting
number of unACKed bytes ldquoin pipelinerdquo
ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)
bull roughly
bull cwnd is dynamic function of perceived network congestion
rate = cwnd
RTT bytessec
LastByteSent-LastByteAcked cwnd
cwndbytes
RTT
ACK(s)
TCP Congestion Control more details
segment loss event reducing cwnd
bull timeout no response from receiverndash cut cwnd to 1
bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less
aggressively than on timeout
ACK received increase cwnd
bull slowstart phase - increase exponentially
fast (despite name) at connection start or following timeout
bull congestion avoidance - increase linearly
TCP Slow Start
bull when connection begins cwnd = 1 MSSndash example MSS = 500
bytes amp RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp
up to respectable ratebull increase rate exponentially
until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transitioning intoout of slowstart
ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2
ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to
congestion avoidance phase
slow start timeout
ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed
new ACKdupACKcount++
duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion
avoidance
TCP Congestion Avoidance
bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1
MSS per RTT ndash approach possible
congestion slower than in slowstart
ndash implementation cwnd = cwnd + MSScwnd for each ACK received
bull ACKs increase cwnd by 1 MSS per RTT additive increase
bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease
AIMD
AIMD Additive IncreaseMultiplicative Decrease
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
Approaches towards congestion control
End-end congestion control
bull No explicit feedback from network
bull Congestion inferred from end-system observed loss delay
bull Approach taken by TCP
Network-assisted congestion control
bull Routers provide feedback to end systemsndash Single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash Explicit rate sender should send at
Broadly
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Congestion Control
bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level
bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)
network not congested so increase sending rate
- lost segment assume loss due to congested network so decrease sending rate
TCP Congestion Control Bandwidth Probing
bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since
available bandwidth is changing depending on other connections in network)
ACKs being received so increase rate
X
X
XX
X loss so decrease rate
send
ing
rate
time
bull Q how fast to increasedecrease- details to follow
TCPrsquosldquosawtoothrdquobehavior
TCP Congestion Control detailsbull sender limits rate by limiting
number of unACKed bytes ldquoin pipelinerdquo
ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)
bull roughly
bull cwnd is dynamic function of perceived network congestion
rate = cwnd
RTT bytessec
LastByteSent-LastByteAcked cwnd
cwndbytes
RTT
ACK(s)
TCP Congestion Control more details
segment loss event reducing cwnd
bull timeout no response from receiverndash cut cwnd to 1
bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less
aggressively than on timeout
ACK received increase cwnd
bull slowstart phase - increase exponentially
fast (despite name) at connection start or following timeout
bull congestion avoidance - increase linearly
TCP Slow Start
bull when connection begins cwnd = 1 MSSndash example MSS = 500
bytes amp RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp
up to respectable ratebull increase rate exponentially
until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transitioning intoout of slowstart
ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2
ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to
congestion avoidance phase
slow start timeout
ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed
new ACKdupACKcount++
duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion
avoidance
TCP Congestion Avoidance
bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1
MSS per RTT ndash approach possible
congestion slower than in slowstart
ndash implementation cwnd = cwnd + MSScwnd for each ACK received
bull ACKs increase cwnd by 1 MSS per RTT additive increase
bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease
AIMD
AIMD Additive IncreaseMultiplicative Decrease
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
Chapter 3 outline
bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer
bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection
management
bull 36 Principles of congestion control
bull 37 TCP congestion control
TCP Congestion Control
bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level
bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)
network not congested so increase sending rate
- lost segment assume loss due to congested network so decrease sending rate
TCP Congestion Control Bandwidth Probing
bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since
available bandwidth is changing depending on other connections in network)
ACKs being received so increase rate
X
X
XX
X loss so decrease rate
send
ing
rate
time
bull Q how fast to increasedecrease- details to follow
TCPrsquosldquosawtoothrdquobehavior
TCP Congestion Control detailsbull sender limits rate by limiting
number of unACKed bytes ldquoin pipelinerdquo
ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)
bull roughly
bull cwnd is dynamic function of perceived network congestion
rate = cwnd
RTT bytessec
LastByteSent-LastByteAcked cwnd
cwndbytes
RTT
ACK(s)
TCP Congestion Control more details
segment loss event reducing cwnd
bull timeout no response from receiverndash cut cwnd to 1
bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less
aggressively than on timeout
ACK received increase cwnd
bull slowstart phase - increase exponentially
fast (despite name) at connection start or following timeout
bull congestion avoidance - increase linearly
TCP Slow Start
bull when connection begins cwnd = 1 MSSndash example MSS = 500
bytes amp RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp
up to respectable ratebull increase rate exponentially
until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transitioning intoout of slowstart
ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2
ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to
congestion avoidance phase
slow start timeout
ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed
new ACKdupACKcount++
duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion
avoidance
TCP Congestion Avoidance
bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1
MSS per RTT ndash approach possible
congestion slower than in slowstart
ndash implementation cwnd = cwnd + MSScwnd for each ACK received
bull ACKs increase cwnd by 1 MSS per RTT additive increase
bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease
AIMD
AIMD Additive IncreaseMultiplicative Decrease
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
TCP Congestion Control
bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level
bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)
network not congested so increase sending rate
- lost segment assume loss due to congested network so decrease sending rate
TCP Congestion Control Bandwidth Probing
bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since
available bandwidth is changing depending on other connections in network)
ACKs being received so increase rate
X
X
XX
X loss so decrease rate
send
ing
rate
time
bull Q how fast to increasedecrease- details to follow
TCPrsquosldquosawtoothrdquobehavior
TCP Congestion Control detailsbull sender limits rate by limiting
number of unACKed bytes ldquoin pipelinerdquo
ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)
bull roughly
bull cwnd is dynamic function of perceived network congestion
rate = cwnd
RTT bytessec
LastByteSent-LastByteAcked cwnd
cwndbytes
RTT
ACK(s)
TCP Congestion Control more details
segment loss event reducing cwnd
bull timeout no response from receiverndash cut cwnd to 1
bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less
aggressively than on timeout
ACK received increase cwnd
bull slowstart phase - increase exponentially
fast (despite name) at connection start or following timeout
bull congestion avoidance - increase linearly
TCP Slow Start
bull when connection begins cwnd = 1 MSSndash example MSS = 500
bytes amp RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp
up to respectable ratebull increase rate exponentially
until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transitioning intoout of slowstart
ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2
ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to
congestion avoidance phase
slow start timeout
ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed
new ACKdupACKcount++
duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion
avoidance
TCP Congestion Avoidance
bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1
MSS per RTT ndash approach possible
congestion slower than in slowstart
ndash implementation cwnd = cwnd + MSScwnd for each ACK received
bull ACKs increase cwnd by 1 MSS per RTT additive increase
bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease
AIMD
AIMD Additive IncreaseMultiplicative Decrease
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
TCP Congestion Control Bandwidth Probing
bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since
available bandwidth is changing depending on other connections in network)
ACKs being received so increase rate
X
X
XX
X loss so decrease rate
send
ing
rate
time
bull Q how fast to increasedecrease- details to follow
TCPrsquosldquosawtoothrdquobehavior
TCP Congestion Control detailsbull sender limits rate by limiting
number of unACKed bytes ldquoin pipelinerdquo
ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)
bull roughly
bull cwnd is dynamic function of perceived network congestion
rate = cwnd
RTT bytessec
LastByteSent-LastByteAcked cwnd
cwndbytes
RTT
ACK(s)
TCP Congestion Control more details
segment loss event reducing cwnd
bull timeout no response from receiverndash cut cwnd to 1
bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less
aggressively than on timeout
ACK received increase cwnd
bull slowstart phase - increase exponentially
fast (despite name) at connection start or following timeout
bull congestion avoidance - increase linearly
TCP Slow Start
bull when connection begins cwnd = 1 MSSndash example MSS = 500
bytes amp RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp
up to respectable ratebull increase rate exponentially
until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transitioning intoout of slowstart
ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2
ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to
congestion avoidance phase
slow start timeout
ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed
new ACKdupACKcount++
duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion
avoidance
TCP Congestion Avoidance
bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1
MSS per RTT ndash approach possible
congestion slower than in slowstart
ndash implementation cwnd = cwnd + MSScwnd for each ACK received
bull ACKs increase cwnd by 1 MSS per RTT additive increase
bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease
AIMD
AIMD Additive IncreaseMultiplicative Decrease
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
TCP Congestion Control detailsbull sender limits rate by limiting
number of unACKed bytes ldquoin pipelinerdquo
ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)
bull roughly
bull cwnd is dynamic function of perceived network congestion
rate = cwnd
RTT bytessec
LastByteSent-LastByteAcked cwnd
cwndbytes
RTT
ACK(s)
TCP Congestion Control more details
segment loss event reducing cwnd
bull timeout no response from receiverndash cut cwnd to 1
bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less
aggressively than on timeout
ACK received increase cwnd
bull slowstart phase - increase exponentially
fast (despite name) at connection start or following timeout
bull congestion avoidance - increase linearly
TCP Slow Start
bull when connection begins cwnd = 1 MSSndash example MSS = 500
bytes amp RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp
up to respectable ratebull increase rate exponentially
until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transitioning intoout of slowstart
ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2
ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to
congestion avoidance phase
slow start timeout
ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed
new ACKdupACKcount++
duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion
avoidance
TCP Congestion Avoidance
bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1
MSS per RTT ndash approach possible
congestion slower than in slowstart
ndash implementation cwnd = cwnd + MSScwnd for each ACK received
bull ACKs increase cwnd by 1 MSS per RTT additive increase
bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease
AIMD
AIMD Additive IncreaseMultiplicative Decrease
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
TCP Congestion Control more details
segment loss event reducing cwnd
bull timeout no response from receiverndash cut cwnd to 1
bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less
aggressively than on timeout
ACK received increase cwnd
bull slowstart phase - increase exponentially
fast (despite name) at connection start or following timeout
bull congestion avoidance - increase linearly
TCP Slow Start
bull when connection begins cwnd = 1 MSSndash example MSS = 500
bytes amp RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp
up to respectable ratebull increase rate exponentially
until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transitioning intoout of slowstart
ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2
ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to
congestion avoidance phase
slow start timeout
ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed
new ACKdupACKcount++
duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion
avoidance
TCP Congestion Avoidance
bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1
MSS per RTT ndash approach possible
congestion slower than in slowstart
ndash implementation cwnd = cwnd + MSScwnd for each ACK received
bull ACKs increase cwnd by 1 MSS per RTT additive increase
bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease
AIMD
AIMD Additive IncreaseMultiplicative Decrease
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
TCP Slow Start
bull when connection begins cwnd = 1 MSSndash example MSS = 500
bytes amp RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp
up to respectable ratebull increase rate exponentially
until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transitioning intoout of slowstart
ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2
ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to
congestion avoidance phase
slow start timeout
ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed
new ACKdupACKcount++
duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion
avoidance
TCP Congestion Avoidance
bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1
MSS per RTT ndash approach possible
congestion slower than in slowstart
ndash implementation cwnd = cwnd + MSScwnd for each ACK received
bull ACKs increase cwnd by 1 MSS per RTT additive increase
bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease
AIMD
AIMD Additive IncreaseMultiplicative Decrease
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
Transitioning intoout of slowstart
ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2
ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to
congestion avoidance phase
slow start timeout
ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed
new ACKdupACKcount++
duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion
avoidance
TCP Congestion Avoidance
bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1
MSS per RTT ndash approach possible
congestion slower than in slowstart
ndash implementation cwnd = cwnd + MSScwnd for each ACK received
bull ACKs increase cwnd by 1 MSS per RTT additive increase
bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease
AIMD
AIMD Additive IncreaseMultiplicative Decrease
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
TCP Congestion Avoidance
bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1
MSS per RTT ndash approach possible
congestion slower than in slowstart
ndash implementation cwnd = cwnd + MSScwnd for each ACK received
bull ACKs increase cwnd by 1 MSS per RTT additive increase
bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease
AIMD
AIMD Additive IncreaseMultiplicative Decrease
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
TCP Congestion Control FSM overview
slow start
congestionavoidance
fastrecovery
cwnd gt ssthresh
losstimeout
losstimeout
new ACK loss3dupACK
loss3dupACK
losstimeout
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
Popular ldquoflavorsrdquo of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
en
ts)
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
Summary TCP Congestion Control
bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially
bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly
bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh
bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
TCP throughput
bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start
bull Let W be window size when loss occursndash when window is W throughput is
WRTTndash just after loss window drops to W2
throughput to W2RTT ndash average throughout 75 WRTT
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull throughput in terms of loss rate
bull L = 210-10 Wowbull New versions of TCP for high-speed
LRTT
MSS221
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
Why is TCP fair
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
i on
2 th
r ou g
h pu t
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
Fairness (more)
Fairness and UDPbull Multimedia apps
often do not use TCPndash do not want rate
throttled by congestion control
bull Instead use UDPndash pump audiovideo at
constant rate tolerate packet loss
Fairness and Parallel TCP Connections
bull Nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP
gets rate R10ndash new app asks for 11 TCPs
gets R2
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo
Chapter 3 Summarybull Principles behind
transport layer servicesndash multiplexing
demultiplexingndash reliable data
transferndash flow controlndash congestion control
bull Instantiation and implementation in the Internetndash UDPndash TCP
Nextbull leaving the
network ldquoedgerdquo (application transport layers)
bull into the network ldquocorerdquo