Upload
andra-holt
View
216
Download
0
Tags:
Embed Size (px)
Citation preview
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
TCP: congestion control and error control
Courtesy ofNitin Vaidya, UIUC
Kevin Lai, UC BerkeleyJim Kurose, UMass
Revisit IPv6.ppt + web passwd + posting period
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
Problem• At what rate do you send data?
– What is max useful sending rate for different apps?
• two components– flow control
• make sure that the receiver can receive • sliding-window based flow control:
– receiver reports window size to sender– higher window higher throughput– throughput = wnd/RTT
– congestion control• make sure that the network can deliver
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
Goals• Robust
– latency: 50us (LAN), 133ms (min, anywhere on Earth, wired), 1s (satellite), 260s (ave Mars)
• 104-106 difference
– bandwidth: 9.6Kb/s (then modem, now cellular), 10 Tb/s
• 109 difference
– 0-100% packet loss– path may change in middle of session (why?)– network may/may not support explicit
congestion signaling
• Distributed control (survivability)
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
Non-decreasing Efficiency under Load
• Efficiency = useful_work/time
• critical property of system design– network technology,
protocol or application
• otherwise, system collapses exactly when most demand for its operation
• trade lower overall efficiency for this?
LoadE
ffici
en
cy
knee
cliff
ok? good
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
Congestion Collapse and Efficiency
• knee – point after which – throughput increases
slowly– delay increases quickly
• cliff – point after which– throughput decreases
quickly to zero (congestion collapse)
– delay goes to infinity
• Congestion avoidance– stay at knee
• Congestion control– stay left of (but usually
close to) cliff Load
Load
Th
rou
ghp
ut
De
lay
knee cliff
over utilization
under utilization
saturation
congestion collapse
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
Transport LayerCongestion Collapse Solutions
• Reduce loss by increasing buffer size. Why not?
• if congestion, then send slowerelse if sending at lower than fair rate, then send faster– congestion control and avoidance (finally)– how to detect network congestion?– how to communicate allocation to sources?– how to determine efficient allocation?– how to determine fair allocation?
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
Metrics for congestion control
• Efficiency– ratio of aggregate throughput to capacity
• Fairness– degree to which everyone is getting equal share
• Convergence time (responsiveness)– How long to get to fairness, efficiency
• Size of oscillation (smoothness)– dynamic systemoscillations around optimal
point
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
Detecting Congestion• Explicit network signal
– Send packet back to source (e.g. ICMP Source Quench)• control traffic congestion collapse
– Set bit in header (e.g. DECbit[CJ89], ECN)• can be subverted by selfish receiver [SEW01]
– Unless on every router, still need end-to-end signal
• Implicit network signal– Loss (e.g. TCP Tahoe, Reno, New Reno, SACK)
• +relatively robust, -no avoidance
– Delay (e.g. TCP Vegas)• +avoidance, -difficult to make robust
– Easily deployable– Robust enough? Wireless?
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
Communicating Allocation to Sources
• Explicit– Send packet back to source or set in packet header
• control traffic congestion collapse• trust receiver
– Need to keep per flow state (anti-Internet architecture)
• what happens if router fails, route changes, mobility
– Unless on every router, still need end-to-end signal– Efficient, fair, responsive, smooth
• Implicit: Chiu and Jain 1988– Can converge to efficiency and fairness without
explicit signal of fair rate– Easily deployable– Good enough?
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
Efficient Allocation• Too slow
– fail to take advantage of available bandwidth underload
• Too fast– overshoot knee overload, high
delay, loss
• Everyone’s doing it– may all under/over shoot large
oscillations
• Optimal:
xi=Xgoal
• Efficiency = 1 - distance from efficiency line
User 1: x1
Use
r 2:
x2
Efficiencyline
2 user example
overload
underload
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
Fair Allocation• Maxmin fairness
– flows which share the same bottleneck get the same amount of bandwidth
• Assumes no knowledge of priorities
• Fairness = 1 - distance from fairness line
User 1: x1U
ser
2: x
2
2 user example
2 gettingtoo much
1 getting too much
fairnessline
2
2
i
i
xn
xxF
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
Control System Model [CJ89]
User 1
User 2
User n
x1
x2
xn
xi>Xgoal
y
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
Possible Choices
• multiplicative increase, additive decrease– aI=0, bI>1, aD<0, bD=1
• additive increase, additive decrease– aI>0, bI=1, aD<0, bD=1
• multiplicative increase, multiplicative decrease– aI=0, bI>1, aD=0, 0<bD<1
• additive increase, multiplicative decrease– aI>0, bI=1, aD=0, 0<bD<1
• Which one?
decreasetxba
increasetxbatx
iDD
iIIi )(
)()1(
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
Multiplicative Increase, Additive Decrease
User 1: x1
Use
r 2:
x2
fairnessline
efficiencyline
(x1h,x2h)
(x1h+aD,x2h+aD)
(bI(x1h+aD), bI(x2h+aD))• Does not
converge to fairness– Not stable at
all
• Does not converges to efficiency– stable iff
I
DIhh b
abxx
121
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
Additive Increase, Additive Decrease
User 1: x1
Use
r 2:
x2
fairnessline
efficiencyline
(x1h,x2h)
(x1h+aD,x2h+aD)
(x1h+aD+aI),x2h+aD+aI))
• Does not converge to fairness
• Does not converge to efficiency– stable iff 0 ID aa
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
Multiplicative Increase, Multiplicative Decrease
User 1: x1
Use
r 2:
x2
fairnessline
efficiencyline
(x1h,x2h)
(bdx1h,bdx2h)
(bIbDx1h,bIbDx2h)
• Does not converge to fairness
• Converges to efficiency iff
10
1
D
I
b
b
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
(bIbDx1h+aI,bIbDx2h+aI)
Additive (and Multiplicative) Increase,
Multiplicative Decrease
User 1: x1
Use
r 2:
x2
fairnessline
efficiencyline
(x1h,x2h)
(bDx1h,bDx2h)
• Converges to fairness
• Converges to efficiency iff
– bI>=1
• Increments smaller as fairness increases
– effect on metrics?
• Additive Increase is better
– why?
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
Significance• Characteristics
– converges to efficiency, fairness– easily deployable– fully distributed– no need to know full state of system (e.g.
number of users, bandwidth of links)
• Theory that enabled the Internet to grow beyond 1989– key milestone in Internet development– fully distributed network architecture requires
fully distributed congestion control– basis for TCP
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
Modeling
• Critical to understanding complex systems– [CJ89] model relevant for 13 years, 106 increase
of bandwidth, 1000x increase in number of users
• Criteria for good models– realistic– simple
• easy to work with• easy for others to understand
– realistic, complex model useless– unrealistic, simple model can teach something
about best case, worst case, etc.
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
TCP Congestion Control• [CJ89] provides theoretical basis
– still many issues to be resolved
• How to start?• Implicit congestion signal
– loss– need to send packets to detect congestion– must reconcile with AIMD
• How to maintain equilibrium?– use ACK: send a new packet only after you
receive ACK. Why?– maintain number of packets in network
“constant”
22
TCP Congestion Control
• Maintains three variables:– cwnd – congestion window– flow_win – flow window: receiver
advertised window – ssthresh – threshold size (used to
update cwnd)
• For sending use: win = min(flow_win, cwnd)
23
TCP: Slow Start• Goal: discover congestion quickly• How?
– quickly increase cwnd until network congested get a rough estimate of the optimal of cwnd
– Whenever starting traffic on a new connection, or whenever increasing traffic after congestion was experienced:
• Set cwnd =1 • Each time a segment is acknowledged increment
cwnd by one (cwnd++).
• Slow Start is not actually slow– cwnd increases exponentially
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
Slow Start Example
• The congestion window size grows very rapidly
• TCP slows down the increase of cwnd when cwnd >= ssthresh
ACK for segment 1
segment 1cwnd = 1
cwnd = 2 segment 2segment 3
ACK for segments 2 + 3
cwnd = 4 segment 4segment 5segment 6segment 7
ACK for segments 4+5+6+7
cwnd = 8
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
Congestion Avoidance
• Slow down “Slow Start”• If cwnd > ssthresh then
each time a segment is acknowledged increment cwnd by 1/cwnd (cwnd += 1/cwnd).
• So cwnd is increased by one only if all segments have been acknowlegded.
• (more about ssthresh latter)
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
Slow Start/Congestion Avoidance Example
• Assume that ssthresh = 8
cwnd = 1
cwnd = 2
cwnd = 4
cwnd = 8
cwnd = 9
cwnd = 10
0
2
4
6
8
10
12
14
Roundtrip times
Cw
nd (
in s
egm
ents
)
ssthresh
27
Putting Everything Together: TCP Pseudocode
Initially:cwnd = 1;ssthresh = infinite;
New ack received:if (cwnd < ssthresh) /* Slow Start*/ cwnd = cwnd + 1;else /* Congestion Avoidance */ cwnd = cwnd + 1/cwnd;
Timeout:
/* Multiplicative decrease */ssthresh = win/2;cwnd = 1;
while (next < unack + win)
transmit next packet;
where win = min(cwnd, flow_win);
unack next
win
seq #
28
The big picture
Time
cwnd
Timeout
Slow Start
CongestionAvoidance
Recall knee-point and cliff-point!
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
Fast Retransmit• Don’t wait for window
to drain• Resend a segment
after 3 duplicate ACKs– remember a duplicate
ACK means that an out-of sequence segment was received
• Notes: – duplicate ACKs due to
packet reordering or loss– window may be too small
to get duplicate ACKs
ACK 1
segment 1cwnd = 1
cwnd = 2 segment 2segment 3
ACK 3cwnd = 4 segment 4
segment 5segment 6segment 7
ACK 1
3 duplicateACKs
ACK 4
ACK 4
ACK 4
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
Fast Recovery
• After a fast-retransmit set cwnd to ssthresh/2– i.e., don’t reset cwnd to 1
• Fast Retransmit and Fast Recovery implemented by TCP Reno; most widely used version of TCP today
31
Fast Retransmit and Fast Recovery
• Retransmit after 3 duplicated acks– prevent expensive timeouts
• No need to slow start again• At steady state, cwnd oscillates
around the optimal window size.
Time
cwnd
Slow Start
CongestionAvoidance
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
Reflections on TCP• assumes that all sources cooperate• assumes that congestion occurs on
time scales greater than 1 RTT• only useful for reliable, in order
delivery, non-real time applications• vulnerable to non-congestion related
loss (e.g. wireless)• can be unfair to long RTT flows
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
Principles of Reliable data transfer
• important in app., transport, link layers
• characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
Reliable data transfer: getting started
sendside
receiveside
rdt_send(): called from above, (e.g., by app.). Passed data to deliver to receiver upper layer
udt_send(): called by rdt,to transfer packet over unreliable channel to
receiver
rdt_rcv(): called when packet arrives on rcv-side of channel
deliver_data(): called by rdt to deliver data to
upper
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
Reliable data transfer: getting started
We’ll:• incrementally develop sender, receiver sides
of reliable data transfer protocol (rdt)• consider only unidirectional data transfer
– but control info will flow on both directions!
• use finite state machines (FSM) to specify sender, receiver
state1
state2
event causing state transitionactions taken on state transition
state: when in this “state” next
state uniquely determined by
next event
eventactions
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
Rdt1.0: reliable transfer over a reliable channel
• underlying channel perfectly reliable– no bit errors– no loss of packets
• separate FSMs for sender, receiver:– sender sends data into underlying channel– receiver read data from underlying channel
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
Rdt2.0: channel with bit errors (no loss)
• underlying channel may flip bits in packet– recall: UDP checksum to detect bit errors
• the question: how to recover from errors:– acknowledgements (ACKs): receiver explicitly
tells sender that pkt received OK– negative acknowledgements (NAKs): receiver
explicitly tells sender that pkt had errors– sender retransmits pkt on receipt of NAK
• new mechanisms in rdt2.0 (beyond rdt1.0):– error detection– receiver feedback: control msgs (ACK,NAK)
• rcvr->sender
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
rdt2.0: FSM specification
sender FSMreceiver FSM
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
rdt2.0: in action (no errors)
sender FSM receiver FSM
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
rdt2.0: in action (error scenario)
sender FSM receiver FSM
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
rdt2.0 has a fatal flaw!What happens if
ACK/NAK corrupted?• sender doesn’t know what
happened at receiver!• can’t just retransmit:
possible duplicate
What to do?• sender ACKs/NAKs
receiver’s ACK/NAK? What if sender ACK/NAK lost?
• retransmit, but this might cause retransmission of correctly received pkt!
Handling duplicates: • sender adds sequence
number to each pkt• sender retransmits
current pkt if ACK/NAK garbled
• receiver discards (doesn’t deliver up) duplicate pkt
Sender sends one packet, then waits for receiver response
stop and wait
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
rdt2.1: sender, handles garbled ACK/NAKs
&& has_seq0(rcvpkt)&& has_seq1(rcvpkt)
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
rdt2.1: receiver, handles garbled ACK/NAKs
rdt_rcv(rcvpkt)&& notcorrupt(rcvpkt)&& has_seq1(rcvpkt) rdt_rcv(rcvpkt)
&& notcorrupt(rcvpkt)&& has_seq1(rcvpkt)
rdt_rcv(rcvpkt)&& notcorrupt(rcvpkt)&& has_seq0(rcvpkt)
rdt_rcv(rcvpkt)&& corrupt(rcvpkt)
udt_send(NACK[0])
udt_send(ACK[1])
Extract(rcvpkt,data)deliver_data(data)udt_send(ACK[1])
udt_send(NACK[1])
udt_send(ACK[0])
Extract(rcvpkt,data)deliver_data(data)udt_send(ACK[0])
rdt_rcv(rcvpkt)&& corrupt(rcvpkt)
rdt_rcv(rcvpkt)&& notcorrupt(rcvpkt)&& has_seq0(rcvpkt)
Wait for 0
Wait for 1
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
rdt2.1: discussionSender:• seq # added to pkt• two seq. #’s (0,1) will
suffice. Why?• must check if received
ACK/NAK corrupted • twice as many states
– state must “remember” whether “current” pkt has 0 or 1 seq. #
Receiver:• must check if
received packet is duplicate– state indicates
whether 0 or 1 is expected pkt seq #
• note: receiver can not know if its last ACK/NAK received OK at sender
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
rdt2.2: a NAK-free protocol
• same functionality as rdt2.1, using NAKs only
• instead of NAK, receiver sends ACK for last pkt received OK– receiver must explicitly
include seq # of pkt being ACKed
• duplicate ACK at sender results in same action as NAK: retransmit current pkt
Sender FSM
!
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
rdt3.0: channels with errors and lossNew assumption:
underlying channel can also lose packets (data or ACKs)– checksum, seq. #,
ACKs, retransmissions will be of help, but not enough
Q: how to deal with loss?– sender waits until
certain data or ACK lost, then retransmits
– yuck: drawbacks?
Approach: sender waits “reasonable” amount of time for ACK
• retransmits if no ACK received in this time
• if pkt (or ACK) just delayed (not lost):– retransmission will be
duplicate, but use of seq. #’s already handles this
– receiver must specify seq # of pkt being ACKed
• requires countdown timer
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
rdt3.0 sender
SenderFSM
(no need to resend)
stop timer
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
rdt_rcv(rcvpkt)&& notcorrupt(rcvpkt)&& has_seq1(rcvpkt) rdt_rcv(rcvpkt)
&& notcorrupt(rcvpkt)&& has_seq1(rcvpkt)
rdt_rcv(rcvpkt)&& notcorrupt(rcvpkt)&& has_seq0(rcvpkt)
rdt_rcv(rcvpkt)&& corrupt(rcvpkt)
udt_send(ACK[1])
udt_send(ACK[1])
Extract(rcvpkt,data)deliver_data(data)udt_send(ACK[1])
udt_send(ACK[0])
udt_send(ACK[0])
Extract(rcvpkt,data)deliver_data(data)udt_send(ACK[0])
rdt_rcv(rcvpkt)&& corrupt(rcvpkt)
rdt_rcv(rcvpkt)&& notcorrupt(rcvpkt)&& has_seq0(rcvpkt)
Wait for 0
Wait for 1
Receiver FSM
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
rdt3.0 in action
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
rdt3.0 in action
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
Performance of rdt3.0• rdt3.0 works, but performance stinks• example: 1 Gbps link, 15 ms e-e prop. delay, 1KB
packet:
Ttransmit=8kb/pkt
10**9 b/sec= 8 microsec
Utilization = U = =8 microsec
30.016 msecfraction of time
sender busy sending = 0.00015
– 1KB pkt every 30 ms -> 33kB/s thruput over 1 Gbps link
– network protocol limits use of physical resources!
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
Pipelined protocolsPipelining: sender allows multiple, “in-
flight”, yet-to-be-acknowledged pkts– range of sequence numbers must be increased– buffering at sender and/or receiver
• Two generic forms of pipelined protocols: go-Back-N, selective repeat
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
Go-Back-N (GBN)Sender:• k-bit seq # in pkt header• “window” of up to N, consecutive unack’ed pkts allowed
• ACK(n): ACKs all pkts up to, including seq # n - “cumulative ACK”– may receive duplicate ACKs (see receiver)
• timer for each in-flight pkt• timeout(n): retransmit pkt n and all higher seq # pkts in
window
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
GBN: sender extended FSM
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
GBN: receiver extended FSM
receiver simple:• ACK-only: always send ACK for correctly-received
pkt with highest in-order seq #– may generate duplicate ACKs– need only remember expectedseqnum
• out-of-order pkt: – discard (don’t buffer) -> no receiver buffering!– ACK pkt with highest in-order seq #
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
GBN inaction
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
When GBN can be not that bad?
• Error rate or distribution?– Long-term or short-term fading– Window size
• Loss rate or distribution?• RTT?• Link bandwidth?
Complexity?
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
Selective Repeat• receiver individually acknowledges all
correctly received pkts– buffers pkts, as needed, for eventual in-order
delivery to upper layer
• sender only resends pkts for which ACK not received– sender timer for each unACKed pkt
• sender window– N consecutive seq #’s– again limits seq #s of sent, unACKed pkts
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
Selective repeat: sender, receiver windows
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
Selective repeat
data from above :• if next available seq
# in window, send pkt
timeout(n):• resend pkt n, restart
timerACK(n) in
[sendbase,sendbase+N]:
• mark pkt n as received
• if n smallest unACKed pkt, advance window base to next unACKed seq #
sender
pkt n in [rcvbase, rcvbase+N-1]
• send ACK(n)• out-of-order: buffer• in-order: deliver (also
deliver buffered, in-order pkts), advance window to next not-yet-received pkt
pkt n in [rcvbase-N,rcvbase-1]
• ACK(n)
otherwise: • ignore
receiver
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
Selective repeat in action
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
Selective repeat: dilemma
Example: • seq #’s: 0, 1, 2, 3• window size=3• receiver sees no
difference in two scenarios!
• incorrectly passes duplicate data as new in (a)
Q: what relationship between seq # size and window size?
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
TCP onMobile Ad Hoc Networks
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
Overview of TCP/IP
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
Internet Protocol (IP)
• Packets may be delivered out-of-order
• Packets may be lost
• Packets may be duplicated
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
Transmission Control Protocol (TCP)
• Reliable ordered delivery• Implements congestion avoidance and
control• Reliability achieved by means of
retransmissions if necessary• End-to-end semantics
– Acknowledgements sent to TCP sender to confirm delivery of data received by TCP receiver
– Ack for data sent only after data has reached receiver
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
TCP Basics• Cumulative acknowledgements• An acknowledgement ack’s all
contiguously received data• TCP assigns byte sequence numbers• For simplicity, we will assign packet
sequence numbers• Also, we use slightly different syntax
for acks than normal TCP syntax– In our notation, ack i acknowledges
receipt of packets through packet i
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
40 39 3738
3533
Cumulative Acknowledgements• A new cumulative acknowledgement is
generated only on receipt of a new in-sequence packet
41 40 3839
35 37
3634
3634
i data acki
src dest
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
Duplicate Acknowledgements• A dupack is generated whenever an
out-of-order segment arrives at the receiver
40 39 3738
3634
42 41 3940
36 36
Dupack
(Above example assumes delayed acks)On receipt of 38
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
Window Based Flow Control
• Sliding window protocol• Window size minimum of
– receiver’s advertised window - determined by available buffer space at the receiver
– congestion window - determined by the sender, based on feedback from the network
2 3 4 5 6 7 8 9 10 11 131 12
Sender’s window
Acks received Not transmitted
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
Window Based Flow Control
2 3 4 5 6 7 8 9 10 11 131 12
Sender’s window
2 3 4 5 6 7 8 9 10 11 131 12
Sender’s window
When receiving Ack 5
Sliding!
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
Window Based Flow Control
• Congestion window size (W) bounds the amount of data that can be sent per round-trip time
• Throughput <= W / RTT
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
Ideal Window Size
• Ideal size = delay * bandwidth– delay-bandwidth product
• What if window size < delay*bw ?– Inefficiency (wasted bandwidth)
• What if > delay*bw ?– Queuing at intermediate routers
• increased RTT due to queuing delays
– Potentially, packet loss
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
How does TCP detect a packet loss?
• Retransmission timeout (RTO)
• Duplicate acknowledgements
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
Detecting Packet Loss Using Retransmission Timeout
(RTO)
• At any time, TCP sender sets retransmission timer for only one packet
• If acknowledgement for the timed packet is not received before timer goes off, the packet is assumed to be lost
• RTO dynamically calculated
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
Retransmission Timeout (RTO) calculation
• RTO = mean + 4 mean deviation– Standard deviation average of (sample
– mean)– Mean deviation average of |sample – mean|– Mean deviation easier to calculate than standard
deviation– Mean deviation is more conservative
2 2
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
Exponential Backoff
• Double RTO on each timeout
Packettransmitted
Time-out occursbefore ack received,packet retransmitted
Timeout interval doubled
T1 T2 = 2 * T1
Windows: initially 3sMax. 240s
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
Fast Retransmission
• Timeouts can take too long– how to initiate retransmission sooner?
• Fast retransmit
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
Detecting Packet Loss Using Dupacks:
Fast Retransmit Mechanism • Dupacks may be generated due to
– packet loss, or– out-of-order packet delivery
• TCP sender assumes that a packet loss has occurred if it receives three dupacks consecutively
12 11 78910
Receipt of packets 9, 10 and 11 will each generatea dupack from the receiver. The sender, on gettingthese dupacks, will retransmit packet 8.
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
Congestion Avoidance and Control
• Slow Start: cwnd grows exponentially with time during slow start
• When cwnd reaches slow-start threshold, congestion avoidance is performed
• Congestion avoidance: cwnd increases linearly with time during congestion avoidance– Rate of increase could be lower if sender does
not always have data to send
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
0
2
4
6
8
10
12
14
0 1 2 3 4 5 6 7 8
Time (round trips)
Con
gest
ion
Win
dow
size
(seg
men
ts)
Slow start
Congestionavoidance
Slow start threshold
Example assumes that acks are not delayed
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
Congestion Control
• On detecting a packet loss, TCP sender assumes that network congestion has occurred
• On detecting packet loss, TCP sender drastically reduces the congestion window
• Reducing congestion window reduces amount of data that can be sent per RTT
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
Congestion Control -- Timeout
• On a timeout, the congestion window is reduced to the initial value of 1 MSS
• The slow start threshold is set to half the window size before packet loss– more precisely,
ssthresh = maximum of min(cwnd,receiver’s advertised window)/2 and 2 MSS
• Slow start is initiated
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
0
5
10
15
20
25
Time (round trips)
Co
ng
esti
on
win
do
w (
seg
men
ts)
ssthresh = 8 ssthresh = 10
cwnd = 20
After timeout
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
Congestion Control - Fast retransmit
• Fast retransmit occurs when multiple (>= 3) dupacks come back
• Fast recovery follows fast retransmit• Different from timeout : slow start follows
timeout– timeout occurs when no more packets are
getting across– fast retransmit occurs when a packet is lost,
but latter packets get through– ack clock is still there when fast retransmit
occurs– no need to slow start
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
Fast Recovery
• ssthresh = min(cwnd, receiver’s adv. window)/2– (at least 2 MSS)
• retransmit the missing segment (fast retransmit)
• cwnd = ssthresh + number of dupacks– Temporary inflation
• when a new ack comes: cwnd = ssthreh– enter congestion avoidance
Congestion window cut into half
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
0
2
4
6
8
10
Time (round trips)
Win
dow
size
(seg
men
ts)
After fast retransmit and fast recovery window size isreduced in half.
Receiver’s advertised window
After fast recovery
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
TCP Performancein Mobile Ad Hoc Networks
(MANETs)
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
Performance of TCP
Several factors affect TCP performance in MANET:
• Wireless transmission errors
• Multi-hop routes on shared wireless medium– For instance, adjacent hops typically cannot
transmit simultaneously
• Route failures due to mobility
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
Random Errors
• If number of bit errors is small, they may be corrected by an error correcting code
• Excessive bit errors result in a packet being discarded, possibly before it reaches the transport layer
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
Random Errors May Cause Fast Retransmit
40 39 3738
3634
Example assumes delayed ack - every other packet ack’d
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
Random Errors May Cause Fast Retransmit
41 40 3839
3634
Example assumes delayed ack - every other packet ack’d
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
Random Errors May Cause Fast Retransmit
42 41 3940
36
Duplicate acks are not delayed
36
dupack
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
Random Errors May Cause Fast Retransmit
40
363636
Duplicate acks
4143 42
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
Random Errors May Cause Fast Retransmit
41
3636
3 duplicate acks triggerfast retransmit at sender
4244 43
36
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
Random Errors May Cause Fast Retransmit
• Fast retransmit results in– retransmission of lost packet– reduction in congestion window
• Reducing congestion window in response to errors is unnecessary
• Reduction in congestion window reduces the throughput
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
Sometimes Congestion Response May be Appropriate in Response to Errors
• On a CDMA channel, errors occur due to interference from other user, and due to noise [Karn99pilc]– Interference due to other users is an indication of congestion.
If such interference causes transmission errors, it is appropriate to reduce congestion window
– If noise causes errors, it is not appropriate to reduce window
• When a channel is in a bad state for a long duration, it might be better to let TCP backoff, so that it does not unnecessarily attempt retransmissions while the channel remains in the bad state [Padmanabhan99pilc]
IETF Performance Implications of Link Characteristics (pilc)
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
Impact of Random Errors [Vaidya99]
0
400000
800000
1200000
1600000
16384 32768 65536 1E+051/error rate (in bytes)
bits/sec
Exponential error model2 Mbps wireless full duplex linkNo congestion losses
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
Burst Errors May Cause Timeouts
• If wireless link remains unavailable for extended duration, a window worth of data may be lost– E.g., driving through a tunnel
• Timeout results in slow start • Slow start reduces congestion window
to 1 MSS, reducing throughput• Reduction in window in response to
errors unnecessary
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
Random Errors May Also Cause Timeout
• Multiple packet losses in a window can result in timeout when using TCP-Reno (and to a lesser extent when using SACK)
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
Impact of Transmission Errors
• TCP cannot distinguish between packet losses due to congestion and transmission errors
• Unnecessarily reduces congestion window
• Throughput suffers
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
Mobile Ad Hoc Networks• May need to traverse multiple links to
reach a destination
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
Mobile Ad Hoc Networks
• Mobility causes route changes
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
Throughput over Multi-Hop Wireless Paths
• Connections over multiple hops are at a disadvantage compared to shorter connections, because they have to contend for wireless access at each hop
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
Impact of Multi-Hop Wireless Paths
0
200
400
600
800
1000
1200
1400
1600
1 2 3 4 5 6 7 8 9 1
Number of hops
TCPThroughtput(Kbps)
TCP Throughput using 2 Mbps 802.11 MAC
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
Throughput Degradations withIncreasing Number of Hops
• Packet transmission can occur on at most one hop among three consecutive hops– Increasing the number of hops from 1 to 2, 3 results
in increased delay, and decreased throughput
• Increasing number of hops beyond 3 allows simultaneous transmissions on more than one link, however, degradation continues due to contention between TCP Data and Acks traveling in opposite directions
• When number of hops is large enough, the throughput stabilizes due to effective pipelining
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
Performance Metric• Expected throughput
1i ii Tf
3 -2 -1 -0
Time (seconds)0 10 20 30 40
MinPath
Length(Hops)
fi = fraction of time TCP source and receiver are i hops away
Ti = TCP throughput across an i-hop network
1040
3040
T1 + T2
T1
T2
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
Impact of MobilityTCP Throughput
Ideal throughput (Kbps)
Act
ual t
hrou
ghpu
t
2 m/s 10 m/s
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
Impact of Mobility
Ideal throughput
Act
ual t
hrou
ghpu
t
20 m/s 30 m/s
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
Throughput generally degrades with increasing
speed …
Speed (m/s)
AverageThroughputOver 50 runs
Ideal
Actual
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
But not always
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
mobility causeslink breakage,resulting in routefailure
TCP data and acksen route discarded
Why Does Throughput Degrade?
TCP sender times out.Starts sending packets again
Route isrepaired
No throughput
No throughputdespite route repair
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
mobility causeslink breakage,resulting in routefailure
TCP data and acksen route discarded
Why Does Throughput Degrade?
TCP sendertimes out.Backs off timer.
Route isrepaired
TCP sendertimes out.Resumessending
Larger route repair delaysespecially harmful
No throughput
No throughput
despite route repair
t 2t
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
Why Does Throughput Improve?Low Speed Scenario
C
B
D
A
C
B
D
A
C
B
D
A
1.5 second route failure
Route from A to D is broken for ~1.5 second.
When TCP sender times after 1 second, route still broken.
TCP times out after another 2 seconds, and only then resumes.
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
Why Does Throughput Improve?Higher (double) Speed Scenario
C
B
D
A
C
B
D
A
C
B
D
A
0.75 second route failure
Route from A to D is broken for ~ 0.75 second.
When TCP sender times after 1 second, route is repaired.
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
Why Does Throughput Improve?
General Principle• The previous two slides show a plausible
cause for improved throughput• TCP timeout interval somewhat (not
entirely) independent of speed• Network state at higher speed, when
timeout occurs, may be more favorable than at lower speed
• Network state– Link/route status– Route caches– Congestion
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
How to Improve Throughput
(Bring Closer to Ideal)• Network feedback
• Inform TCP of route failure by explicit message
• Let TCP know when route is repaired– Probing– Explicit notification
• Reduces repeated TCP timeouts and backoff
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
Performance with Explicit Notification
0
0.2
0.4
0.6
0.8
1
2 10 20 30
mean speed (m/s)
thro
ug
hp
ut
as a
fra
ctio
n o
fid
eal Base TCP
With explicitnotification
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
Issues: Network Feedback• Network knows best (why packets
are lost)
+ Network feedback beneficial- Need to modify transport & network
layer to receive/send feedback
• Need mechanisms for information exchange between layers
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
Impact of Caching
• Route caching has been suggested as a mechanism to reduce route discovery overhead [Broch98]
• Each node may cache one or more routes to a given destination
• When a route from S to D is detected as broken, node S may:– Use another cached route from local cache, or– Obtain a new route using cached route at
another node
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
To Cache or Not to Cache
Average speed (m/s)Ac t
ual t
hrou
ghpu
t (a s
fra
ctio
n of
exp
ecte
d th
roug
hput
)
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
Why Performance Degrades With Caching
• When a route is broken, route discovery returns a cached route from local cache or from a nearby node
• After a time-out, TCP sender transmits a packet on the new route.However, the cached route has also broken after it was cached
• Another route discovery, and TCP time-out interval
• Process repeats until a good route is found
timeout dueto route failure
timeout, cachedroute is broken
timeout, second cachedroute also broken
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
Issues: To Cache or Not to Cache
• Caching can result in faster route “repair”
• Faster does not necessarily mean correct
• If incorrect repairs occur often enough, caching performs poorly
• Need mechanisms for determining when cached routes are stale
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
Caching and TCP performance
• Caching can reduce overhead of route discovery even if cache accuracy is not very high
• But if cache accuracy is not high enough, gains in routing overhead may be offset by loss of TCP performance due to multiple time-outs
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
TCP PerformanceTwo factors result in degraded throughput in
presence of mobility:
• Loss of throughput that occurs while waiting for TCP sender to timeout (as seen earlier)– This factor can be mitigated by using explicit
notifications and better route caching mechanisms
• Poor choice of congestion window and RTO values after a new route has been found– How to choose cwnd and RTO after a route
change?
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
Issues Window Size After Route Repair• Same as before route break: may be too
optimistic
• Same as startup: may be too conservative
• Better be conservative than overly optimistic– Reset window to small value after route repair– Let TCP figure out the suitable window size– Impact low on paths with small delay-bw product
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
Issues: RTO After Route Repair
• Same as before route break– If new route long, this RTO may be too small,
leading to timeouts
• Same as TCP start-up– May be too large– May result in slow response to next packet
loss
• Another plausible approach: new RTO = function of old RTO, old route length, and new route length– Example: new RTO = old RTO * new route
length / old route length– Not evaluated yet– Pitfall: RTT is not just a function of route length
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
Out-of-Order Packet Delivery
• Out-of-order (OOO) delivery may occur due to:– Route changes– Link layer retransmissions schemes that deliver OOO
• Significantly OOO delivery confuses TCP, triggering fast retransmit
• Potential solutions:– Deterministically prefer one route over others, even if
multiple routes are known– Reduce OOO delivery by re-ordering received packets
• can result in unnecessary delay in presence of packet loss
– Turn off fast retransmit• can result in poor performance in presence of congestion
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
Impact of Acknowledgements• TCP Acks (and link layer acks) share the
wireless bandwidth with TCP data packets
• Data and Acks travel in opposite directions– In addition to bandwidth usage, acks require
additional receive-send turnarounds, which also incur time penalty
– To reduce frequency of send-receive turnaround and contention between acks and data
Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,
UMass
Impact of Acks: Mitigation [Balakrishnan97]
• Piggybacking link layer acks with data
• Sending fewer TCP acks - ack every d-th packet (d may be chosen dynamically)
• but need to use rate control at sender to reduce burstiness (for large d)
• Ack filtering - Gateway may drop an older ack in the queue, if a new ack arrives– reduces number of acks that need to be
delivered to the sender