Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose, UMass TCP: congestion control and error control Courtesy of Nitin Vaidya, UIUC

Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose,

UMass

TCP: congestion control and error control

Courtesy ofNitin Vaidya, UIUC

Kevin Lai, UC BerkeleyJim Kurose, UMass

Revisit IPv6.ppt + web passwd + posting period


UMass


UMass

Problem• At what rate do you send data?

– What is max useful sending rate for different apps?

• two components– flow control

• make sure that the receiver can receive • sliding-window based flow control:

– receiver reports window size to sender– higher window higher throughput– throughput = wnd/RTT

– congestion control• make sure that the network can deliver


UMass

Goals• Robust

– latency: 50us (LAN), 133ms (min, anywhere on Earth, wired), 1s (satellite), 260s (ave Mars)

• 104-106 difference

– bandwidth: 9.6Kb/s (then modem, now cellular), 10 Tb/s

• 109 difference

– 0-100% packet loss– path may change in middle of session (why?)– network may/may not support explicit

congestion signaling

• Distributed control (survivability)


UMass

Non-decreasing Efficiency under Load

• Efficiency = useful_work/time

• critical property of system design– network technology,

protocol or application

• otherwise, system collapses exactly when most demand for its operation

• trade lower overall efficiency for this?

LoadE

ffici

en

cy

knee

cliff

ok? good


UMass

Congestion Collapse and Efficiency

• knee – point after which – throughput increases

slowly– delay increases quickly

• cliff – point after which– throughput decreases

quickly to zero (congestion collapse)

– delay goes to infinity

• Congestion avoidance– stay at knee

• Congestion control– stay left of (but usually

close to) cliff Load

Load

Th

rou

ghp

ut

De

lay

knee cliff

over utilization

under utilization

saturation

congestion collapse


UMass

Transport LayerCongestion Collapse Solutions

• Reduce loss by increasing buffer size. Why not?

• if congestion, then send slowerelse if sending at lower than fair rate, then send faster– congestion control and avoidance (finally)– how to detect network congestion?– how to communicate allocation to sources?– how to determine efficient allocation?– how to determine fair allocation?


UMass

Metrics for congestion control

• Efficiency– ratio of aggregate throughput to capacity

• Fairness– degree to which everyone is getting equal share

• Convergence time (responsiveness)– How long to get to fairness, efficiency

• Size of oscillation (smoothness)– dynamic systemoscillations around optimal

point


UMass

Detecting Congestion• Explicit network signal

– Send packet back to source (e.g. ICMP Source Quench)• control traffic congestion collapse

– Set bit in header (e.g. DECbit[CJ89], ECN)• can be subverted by selfish receiver [SEW01]

– Unless on every router, still need end-to-end signal

• Implicit network signal– Loss (e.g. TCP Tahoe, Reno, New Reno, SACK)

• +relatively robust, -no avoidance

– Delay (e.g. TCP Vegas)• +avoidance, -difficult to make robust

– Easily deployable– Robust enough? Wireless?


UMass

Communicating Allocation to Sources

• Explicit– Send packet back to source or set in packet header

• control traffic congestion collapse• trust receiver

– Need to keep per flow state (anti-Internet architecture)

• what happens if router fails, route changes, mobility

– Unless on every router, still need end-to-end signal– Efficient, fair, responsive, smooth

• Implicit: Chiu and Jain 1988– Can converge to efficiency and fairness without

explicit signal of fair rate– Easily deployable– Good enough?


UMass

Efficient Allocation• Too slow

– fail to take advantage of available bandwidth underload

• Too fast– overshoot knee overload, high

delay, loss

• Everyone’s doing it– may all under/over shoot large

oscillations

• Optimal:

xi=Xgoal

• Efficiency = 1 - distance from efficiency line

User 1: x1

Use

r 2:

x2

Efficiencyline

2 user example

overload

underload


UMass

Fair Allocation• Maxmin fairness

– flows which share the same bottleneck get the same amount of bandwidth

• Assumes no knowledge of priorities

• Fairness = 1 - distance from fairness line

User 1: x1U

ser

2: x

2

2 user example

2 gettingtoo much

1 getting too much

fairnessline

2

2

i

i

xn

xxF


UMass

Control System Model [CJ89]

User 1

User 2

User n

x1

x2

xn

xi>Xgoal

y


UMass

Possible Choices

• multiplicative increase, additive decrease– aI=0, bI>1, aD<0, bD=1

• additive increase, additive decrease– aI>0, bI=1, aD<0, bD=1

• multiplicative increase, multiplicative decrease– aI=0, bI>1, aD=0, 0<bD<1

• additive increase, multiplicative decrease– aI>0, bI=1, aD=0, 0<bD<1

• Which one?

decreasetxba

increasetxbatx

iDD

iIIi )(

)()1(


UMass

Multiplicative Increase, Additive Decrease

User 1: x1

Use

r 2:

x2

fairnessline

efficiencyline

(x1h,x2h)

(x1h+aD,x2h+aD)

(bI(x1h+aD), bI(x2h+aD))• Does not

converge to fairness– Not stable at

all

• Does not converges to efficiency– stable iff

I

DIhh b

abxx

121


UMass

Additive Increase, Additive Decrease

User 1: x1

Use

r 2:

x2

fairnessline

efficiencyline

(x1h,x2h)

(x1h+aD,x2h+aD)

(x1h+aD+aI),x2h+aD+aI))

• Does not converge to fairness

• Does not converge to efficiency– stable iff 0 ID aa


UMass

Multiplicative Increase, Multiplicative Decrease

User 1: x1

Use

r 2:

x2

fairnessline

efficiencyline

(x1h,x2h)

(bdx1h,bdx2h)

(bIbDx1h,bIbDx2h)

• Does not converge to fairness

• Converges to efficiency iff

10

1

D

I

b

b


UMass

(bIbDx1h+aI,bIbDx2h+aI)

Additive (and Multiplicative) Increase,

Multiplicative Decrease

User 1: x1

Use

r 2:

x2

fairnessline

efficiencyline

(x1h,x2h)

(bDx1h,bDx2h)

• Converges to fairness

• Converges to efficiency iff

– bI>=1

• Increments smaller as fairness increases

– effect on metrics?

• Additive Increase is better

– why?


UMass

Significance• Characteristics

– converges to efficiency, fairness– easily deployable– fully distributed– no need to know full state of system (e.g.

number of users, bandwidth of links)

• Theory that enabled the Internet to grow beyond 1989– key milestone in Internet development– fully distributed network architecture requires

fully distributed congestion control– basis for TCP


UMass

Modeling

• Critical to understanding complex systems– [CJ89] model relevant for 13 years, 106 increase

of bandwidth, 1000x increase in number of users

• Criteria for good models– realistic– simple

• easy to work with• easy for others to understand

– realistic, complex model useless– unrealistic, simple model can teach something

about best case, worst case, etc.


UMass

TCP Congestion Control• [CJ89] provides theoretical basis

– still many issues to be resolved

• How to start?• Implicit congestion signal

– loss– need to send packets to detect congestion– must reconcile with AIMD

• How to maintain equilibrium?– use ACK: send a new packet only after you

receive ACK. Why?– maintain number of packets in network

“constant”

22

TCP Congestion Control

• Maintains three variables:– cwnd – congestion window– flow_win – flow window: receiver

advertised window – ssthresh – threshold size (used to

update cwnd)

• For sending use: win = min(flow_win, cwnd)

23

TCP: Slow Start• Goal: discover congestion quickly• How?

– quickly increase cwnd until network congested get a rough estimate of the optimal of cwnd

– Whenever starting traffic on a new connection, or whenever increasing traffic after congestion was experienced:

• Set cwnd =1 • Each time a segment is acknowledged increment

cwnd by one (cwnd++).

• Slow Start is not actually slow– cwnd increases exponentially


UMass

Slow Start Example

• The congestion window size grows very rapidly

• TCP slows down the increase of cwnd when cwnd >= ssthresh

ACK for segment 1

segment 1cwnd = 1

cwnd = 2 segment 2segment 3

ACK for segments 2 + 3

cwnd = 4 segment 4segment 5segment 6segment 7

ACK for segments 4+5+6+7

cwnd = 8


UMass

Congestion Avoidance

• Slow down “Slow Start”• If cwnd > ssthresh then

each time a segment is acknowledged increment cwnd by 1/cwnd (cwnd += 1/cwnd).

• So cwnd is increased by one only if all segments have been acknowlegded.

• (more about ssthresh latter)


UMass

Slow Start/Congestion Avoidance Example

• Assume that ssthresh = 8

cwnd = 1

cwnd = 2

cwnd = 4

cwnd = 8

cwnd = 9

cwnd = 10

0

2

4

6

8

10

12

14

Roundtrip times

Cw

nd (

in s

egm

ents

)

ssthresh

27

Putting Everything Together: TCP Pseudocode

Initially:cwnd = 1;ssthresh = infinite;

New ack received:if (cwnd < ssthresh) /* Slow Start*/ cwnd = cwnd + 1;else /* Congestion Avoidance */ cwnd = cwnd + 1/cwnd;

Timeout:

/* Multiplicative decrease */ssthresh = win/2;cwnd = 1;

while (next < unack + win)

transmit next packet;

where win = min(cwnd, flow_win);

unack next

win

seq #

28

The big picture

Time

cwnd

Timeout

Slow Start

CongestionAvoidance

Recall knee-point and cliff-point!


UMass

Fast Retransmit• Don’t wait for window

to drain• Resend a segment

after 3 duplicate ACKs– remember a duplicate

ACK means that an out-of sequence segment was received

• Notes: – duplicate ACKs due to

packet reordering or loss– window may be too small

to get duplicate ACKs

ACK 1

segment 1cwnd = 1

cwnd = 2 segment 2segment 3

ACK 3cwnd = 4 segment 4

segment 5segment 6segment 7

ACK 1

3 duplicateACKs

ACK 4

ACK 4

ACK 4


UMass

Fast Recovery

• After a fast-retransmit set cwnd to ssthresh/2– i.e., don’t reset cwnd to 1

• Fast Retransmit and Fast Recovery implemented by TCP Reno; most widely used version of TCP today

31

Fast Retransmit and Fast Recovery

• Retransmit after 3 duplicated acks– prevent expensive timeouts

• No need to slow start again• At steady state, cwnd oscillates

around the optimal window size.

Time

cwnd

Slow Start

CongestionAvoidance


UMass

Reflections on TCP• assumes that all sources cooperate• assumes that congestion occurs on

time scales greater than 1 RTT• only useful for reliable, in order

delivery, non-real time applications• vulnerable to non-congestion related

loss (e.g. wireless)• can be unfair to long RTT flows


UMass

Principles of Reliable data transfer

• important in app., transport, link layers

• characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)


UMass

Reliable data transfer: getting started

sendside

receiveside

rdt_send(): called from above, (e.g., by app.). Passed data to deliver to receiver upper layer

udt_send(): called by rdt,to transfer packet over unreliable channel to

receiver

rdt_rcv(): called when packet arrives on rcv-side of channel

deliver_data(): called by rdt to deliver data to

upper


UMass

Reliable data transfer: getting started

We’ll:• incrementally develop sender, receiver sides

of reliable data transfer protocol (rdt)• consider only unidirectional data transfer

– but control info will flow on both directions!

• use finite state machines (FSM) to specify sender, receiver

state1

state2

event causing state transitionactions taken on state transition

state: when in this “state” next

state uniquely determined by

next event

eventactions


UMass

Rdt1.0: reliable transfer over a reliable channel

• underlying channel perfectly reliable– no bit errors– no loss of packets

• separate FSMs for sender, receiver:– sender sends data into underlying channel– receiver read data from underlying channel


UMass

Rdt2.0: channel with bit errors (no loss)

• underlying channel may flip bits in packet– recall: UDP checksum to detect bit errors

• the question: how to recover from errors:– acknowledgements (ACKs): receiver explicitly

tells sender that pkt received OK– negative acknowledgements (NAKs): receiver

explicitly tells sender that pkt had errors– sender retransmits pkt on receipt of NAK

• new mechanisms in rdt2.0 (beyond rdt1.0):– error detection– receiver feedback: control msgs (ACK,NAK)

• rcvr->sender


UMass

rdt2.0: FSM specification

sender FSMreceiver FSM


UMass

rdt2.0: in action (no errors)

sender FSM receiver FSM


UMass

rdt2.0: in action (error scenario)

sender FSM receiver FSM


UMass

rdt2.0 has a fatal flaw!What happens if

ACK/NAK corrupted?• sender doesn’t know what

happened at receiver!• can’t just retransmit:

possible duplicate

What to do?• sender ACKs/NAKs

receiver’s ACK/NAK? What if sender ACK/NAK lost?

• retransmit, but this might cause retransmission of correctly received pkt!

Handling duplicates: • sender adds sequence

number to each pkt• sender retransmits

current pkt if ACK/NAK garbled

• receiver discards (doesn’t deliver up) duplicate pkt

Sender sends one packet, then waits for receiver response

stop and wait


UMass

rdt2.1: sender, handles garbled ACK/NAKs

&& has_seq0(rcvpkt)&& has_seq1(rcvpkt)


UMass

rdt2.1: receiver, handles garbled ACK/NAKs

rdt_rcv(rcvpkt)&& notcorrupt(rcvpkt)&& has_seq1(rcvpkt) rdt_rcv(rcvpkt)

&& notcorrupt(rcvpkt)&& has_seq1(rcvpkt)

rdt_rcv(rcvpkt)&& notcorrupt(rcvpkt)&& has_seq0(rcvpkt)

rdt_rcv(rcvpkt)&& corrupt(rcvpkt)

udt_send(NACK[0])

udt_send(ACK[1])

Extract(rcvpkt,data)deliver_data(data)udt_send(ACK[1])

udt_send(NACK[1])

udt_send(ACK[0])




Wait for 0

Wait for 1


UMass

rdt2.1: discussionSender:• seq # added to pkt• two seq. #’s (0,1) will

suffice. Why?• must check if received

ACK/NAK corrupted • twice as many states

– state must “remember” whether “current” pkt has 0 or 1 seq. #

Receiver:• must check if

received packet is duplicate– state indicates

whether 0 or 1 is expected pkt seq #

• note: receiver can not know if its last ACK/NAK received OK at sender


UMass

rdt2.2: a NAK-free protocol

• same functionality as rdt2.1, using NAKs only

• instead of NAK, receiver sends ACK for last pkt received OK– receiver must explicitly

include seq # of pkt being ACKed

• duplicate ACK at sender results in same action as NAK: retransmit current pkt

Sender FSM

!


UMass

rdt3.0: channels with errors and lossNew assumption:

underlying channel can also lose packets (data or ACKs)– checksum, seq. #,

ACKs, retransmissions will be of help, but not enough

Q: how to deal with loss?– sender waits until

certain data or ACK lost, then retransmits

– yuck: drawbacks?

Approach: sender waits “reasonable” amount of time for ACK

• retransmits if no ACK received in this time

• if pkt (or ACK) just delayed (not lost):– retransmission will be

duplicate, but use of seq. #’s already handles this

– receiver must specify seq # of pkt being ACKed

• requires countdown timer


UMass

rdt3.0 sender

SenderFSM

(no need to resend)

stop timer


UMass

rdt_rcv(rcvpkt)&& notcorrupt(rcvpkt)&& has_seq1(rcvpkt) rdt_rcv(rcvpkt)

&& notcorrupt(rcvpkt)&& has_seq1(rcvpkt)



udt_send(ACK[1])

udt_send(ACK[1])


udt_send(ACK[0])

udt_send(ACK[0])




Wait for 0

Wait for 1

Receiver FSM


UMass

rdt3.0 in action


UMass

rdt3.0 in action


UMass

Performance of rdt3.0• rdt3.0 works, but performance stinks• example: 1 Gbps link, 15 ms e-e prop. delay, 1KB

packet:

Ttransmit=8kb/pkt

10**9 b/sec= 8 microsec

Utilization = U = =8 microsec

30.016 msecfraction of time

sender busy sending = 0.00015

– 1KB pkt every 30 ms -> 33kB/s thruput over 1 Gbps link

– network protocol limits use of physical resources!


UMass

Pipelined protocolsPipelining: sender allows multiple, “in-

flight”, yet-to-be-acknowledged pkts– range of sequence numbers must be increased– buffering at sender and/or receiver

• Two generic forms of pipelined protocols: go-Back-N, selective repeat


UMass

Go-Back-N (GBN)Sender:• k-bit seq # in pkt header• “window” of up to N, consecutive unack’ed pkts allowed

• ACK(n): ACKs all pkts up to, including seq # n - “cumulative ACK”– may receive duplicate ACKs (see receiver)

• timer for each in-flight pkt• timeout(n): retransmit pkt n and all higher seq # pkts in

window


UMass

GBN: sender extended FSM


UMass

GBN: receiver extended FSM

receiver simple:• ACK-only: always send ACK for correctly-received

pkt with highest in-order seq #– may generate duplicate ACKs– need only remember expectedseqnum

• out-of-order pkt: – discard (don’t buffer) -> no receiver buffering!– ACK pkt with highest in-order seq #


UMass

GBN inaction


UMass

When GBN can be not that bad?

• Error rate or distribution?– Long-term or short-term fading– Window size

• Loss rate or distribution?• RTT?• Link bandwidth?

Complexity?


UMass

Selective Repeat• receiver individually acknowledges all

correctly received pkts– buffers pkts, as needed, for eventual in-order

delivery to upper layer

• sender only resends pkts for which ACK not received– sender timer for each unACKed pkt

• sender window– N consecutive seq #’s– again limits seq #s of sent, unACKed pkts


UMass

Selective repeat: sender, receiver windows


UMass

Selective repeat

data from above :• if next available seq

# in window, send pkt

timeout(n):• resend pkt n, restart

timerACK(n) in

[sendbase,sendbase+N]:

• mark pkt n as received

• if n smallest unACKed pkt, advance window base to next unACKed seq #

sender

pkt n in [rcvbase, rcvbase+N-1]

• send ACK(n)• out-of-order: buffer• in-order: deliver (also

deliver buffered, in-order pkts), advance window to next not-yet-received pkt

pkt n in [rcvbase-N,rcvbase-1]

• ACK(n)

otherwise: • ignore

receiver


UMass

Selective repeat in action


UMass

Selective repeat: dilemma

Example: • seq #’s: 0, 1, 2, 3• window size=3• receiver sees no

difference in two scenarios!

• incorrectly passes duplicate data as new in (a)

Q: what relationship between seq # size and window size?


UMass

TCP onMobile Ad Hoc Networks


UMass

Overview of TCP/IP


UMass

Internet Protocol (IP)

• Packets may be delivered out-of-order

• Packets may be lost

• Packets may be duplicated


UMass

Transmission Control Protocol (TCP)

• Reliable ordered delivery• Implements congestion avoidance and

control• Reliability achieved by means of

retransmissions if necessary• End-to-end semantics

– Acknowledgements sent to TCP sender to confirm delivery of data received by TCP receiver

– Ack for data sent only after data has reached receiver


UMass

TCP Basics• Cumulative acknowledgements• An acknowledgement ack’s all

contiguously received data• TCP assigns byte sequence numbers• For simplicity, we will assign packet

sequence numbers• Also, we use slightly different syntax

for acks than normal TCP syntax– In our notation, ack i acknowledges

receipt of packets through packet i


UMass

40 39 3738

3533

Cumulative Acknowledgements• A new cumulative acknowledgement is

generated only on receipt of a new in-sequence packet

41 40 3839

35 37

3634

3634

i data acki

src dest


UMass

Duplicate Acknowledgements• A dupack is generated whenever an

out-of-order segment arrives at the receiver

40 39 3738

3634

42 41 3940

36 36

Dupack

(Above example assumes delayed acks)On receipt of 38


UMass

Window Based Flow Control

• Sliding window protocol• Window size minimum of

– receiver’s advertised window - determined by available buffer space at the receiver

– congestion window - determined by the sender, based on feedback from the network

2 3 4 5 6 7 8 9 10 11 131 12

Sender’s window

Acks received Not transmitted


UMass


2 3 4 5 6 7 8 9 10 11 131 12

Sender’s window

2 3 4 5 6 7 8 9 10 11 131 12

Sender’s window

When receiving Ack 5

Sliding!


UMass


• Congestion window size (W) bounds the amount of data that can be sent per round-trip time

• Throughput <= W / RTT


UMass

Ideal Window Size

• Ideal size = delay * bandwidth– delay-bandwidth product

• What if window size < delay*bw ?– Inefficiency (wasted bandwidth)

• What if > delay*bw ?– Queuing at intermediate routers

• increased RTT due to queuing delays

– Potentially, packet loss


UMass

How does TCP detect a packet loss?

• Retransmission timeout (RTO)

• Duplicate acknowledgements


UMass

Detecting Packet Loss Using Retransmission Timeout

(RTO)

• At any time, TCP sender sets retransmission timer for only one packet

• If acknowledgement for the timed packet is not received before timer goes off, the packet is assumed to be lost

• RTO dynamically calculated


UMass

Retransmission Timeout (RTO) calculation

• RTO = mean + 4 mean deviation– Standard deviation average of (sample

– mean)– Mean deviation average of |sample – mean|– Mean deviation easier to calculate than standard

deviation– Mean deviation is more conservative

2 2


UMass

Exponential Backoff

• Double RTO on each timeout

Packettransmitted

Time-out occursbefore ack received,packet retransmitted

Timeout interval doubled

T1 T2 = 2 * T1

Windows: initially 3sMax. 240s


UMass

Fast Retransmission

• Timeouts can take too long– how to initiate retransmission sooner?

• Fast retransmit


UMass

Detecting Packet Loss Using Dupacks:

Fast Retransmit Mechanism • Dupacks may be generated due to

– packet loss, or– out-of-order packet delivery

• TCP sender assumes that a packet loss has occurred if it receives three dupacks consecutively

12 11 78910

Receipt of packets 9, 10 and 11 will each generatea dupack from the receiver. The sender, on gettingthese dupacks, will retransmit packet 8.


UMass

Congestion Avoidance and Control

• Slow Start: cwnd grows exponentially with time during slow start

• When cwnd reaches slow-start threshold, congestion avoidance is performed

• Congestion avoidance: cwnd increases linearly with time during congestion avoidance– Rate of increase could be lower if sender does

not always have data to send


UMass

0

2

4

6

8

10

12

14

0 1 2 3 4 5 6 7 8

Time (round trips)

Con

gest

ion

Win

dow

size

(seg

men

ts)

Slow start

Congestionavoidance

Slow start threshold

Example assumes that acks are not delayed


UMass

Congestion Control

• On detecting a packet loss, TCP sender assumes that network congestion has occurred

• On detecting packet loss, TCP sender drastically reduces the congestion window

• Reducing congestion window reduces amount of data that can be sent per RTT


UMass

Congestion Control -- Timeout

• On a timeout, the congestion window is reduced to the initial value of 1 MSS

• The slow start threshold is set to half the window size before packet loss– more precisely,

ssthresh = maximum of min(cwnd,receiver’s advertised window)/2 and 2 MSS

• Slow start is initiated


UMass

0

5

10

15

20

25

Time (round trips)

Co

ng

esti

on

win

do

w (

seg

men

ts)

ssthresh = 8 ssthresh = 10

cwnd = 20

After timeout


UMass

Congestion Control - Fast retransmit

• Fast retransmit occurs when multiple (>= 3) dupacks come back

• Fast recovery follows fast retransmit• Different from timeout : slow start follows

timeout– timeout occurs when no more packets are

getting across– fast retransmit occurs when a packet is lost,

but latter packets get through– ack clock is still there when fast retransmit

occurs– no need to slow start


UMass

Fast Recovery

• ssthresh = min(cwnd, receiver’s adv. window)/2– (at least 2 MSS)

• retransmit the missing segment (fast retransmit)

• cwnd = ssthresh + number of dupacks– Temporary inflation

• when a new ack comes: cwnd = ssthreh– enter congestion avoidance

Congestion window cut into half


UMass

0

2

4

6

8

10

Time (round trips)

Win

dow

size

(seg

men

ts)

After fast retransmit and fast recovery window size isreduced in half.

Receiver’s advertised window

After fast recovery


UMass

TCP Performancein Mobile Ad Hoc Networks

(MANETs)


UMass

Performance of TCP

Several factors affect TCP performance in MANET:

• Wireless transmission errors

• Multi-hop routes on shared wireless medium– For instance, adjacent hops typically cannot

transmit simultaneously

• Route failures due to mobility


UMass

Random Errors

• If number of bit errors is small, they may be corrected by an error correcting code

• Excessive bit errors result in a packet being discarded, possibly before it reaches the transport layer


UMass

Random Errors May Cause Fast Retransmit

40 39 3738

3634

Example assumes delayed ack - every other packet ack’d


UMass


41 40 3839

3634

Example assumes delayed ack - every other packet ack’d


UMass


42 41 3940

36

Duplicate acks are not delayed

36

dupack


UMass


40

363636

Duplicate acks

4143 42


UMass


41

3636

3 duplicate acks triggerfast retransmit at sender

4244 43

36


UMass


• Fast retransmit results in– retransmission of lost packet– reduction in congestion window

• Reducing congestion window in response to errors is unnecessary

• Reduction in congestion window reduces the throughput


UMass

Sometimes Congestion Response May be Appropriate in Response to Errors

• On a CDMA channel, errors occur due to interference from other user, and due to noise [Karn99pilc]– Interference due to other users is an indication of congestion.

If such interference causes transmission errors, it is appropriate to reduce congestion window

– If noise causes errors, it is not appropriate to reduce window

• When a channel is in a bad state for a long duration, it might be better to let TCP backoff, so that it does not unnecessarily attempt retransmissions while the channel remains in the bad state [Padmanabhan99pilc]

IETF Performance Implications of Link Characteristics (pilc)


UMass

Impact of Random Errors [Vaidya99]

0

400000

800000

1200000

1600000

16384 32768 65536 1E+051/error rate (in bytes)

bits/sec

Exponential error model2 Mbps wireless full duplex linkNo congestion losses


UMass

Burst Errors May Cause Timeouts

• If wireless link remains unavailable for extended duration, a window worth of data may be lost– E.g., driving through a tunnel

• Timeout results in slow start • Slow start reduces congestion window

to 1 MSS, reducing throughput• Reduction in window in response to

errors unnecessary


UMass

Random Errors May Also Cause Timeout

• Multiple packet losses in a window can result in timeout when using TCP-Reno (and to a lesser extent when using SACK)


UMass

Impact of Transmission Errors

• TCP cannot distinguish between packet losses due to congestion and transmission errors

• Unnecessarily reduces congestion window

• Throughput suffers


UMass

Mobile Ad Hoc Networks• May need to traverse multiple links to

reach a destination


UMass

Mobile Ad Hoc Networks

• Mobility causes route changes


UMass

Throughput over Multi-Hop Wireless Paths

• Connections over multiple hops are at a disadvantage compared to shorter connections, because they have to contend for wireless access at each hop


UMass

Impact of Multi-Hop Wireless Paths

0

200

400

600

800

1000

1200

1400

1600

1 2 3 4 5 6 7 8 9 1

Number of hops

TCPThroughtput(Kbps)

TCP Throughput using 2 Mbps 802.11 MAC


UMass

Throughput Degradations withIncreasing Number of Hops

• Packet transmission can occur on at most one hop among three consecutive hops– Increasing the number of hops from 1 to 2, 3 results

in increased delay, and decreased throughput

• Increasing number of hops beyond 3 allows simultaneous transmissions on more than one link, however, degradation continues due to contention between TCP Data and Acks traveling in opposite directions

• When number of hops is large enough, the throughput stabilizes due to effective pipelining


UMass

Performance Metric• Expected throughput

1i ii Tf

3 -2 -1 -0

Time (seconds)0 10 20 30 40

MinPath

Length(Hops)

fi = fraction of time TCP source and receiver are i hops away

Ti = TCP throughput across an i-hop network

1040

3040

T1 + T2

T1

T2


UMass

Impact of MobilityTCP Throughput

Ideal throughput (Kbps)

Act

ual t

hrou

ghpu

t

2 m/s 10 m/s


UMass

Impact of Mobility

Ideal throughput

Act

ual t

hrou

ghpu

t

20 m/s 30 m/s


UMass

Throughput generally degrades with increasing

speed …

Speed (m/s)

AverageThroughputOver 50 runs

Ideal

Actual


UMass

But not always


UMass

mobility causeslink breakage,resulting in routefailure

TCP data and acksen route discarded

Why Does Throughput Degrade?

TCP sender times out.Starts sending packets again

Route isrepaired

No throughput

No throughputdespite route repair


UMass

mobility causeslink breakage,resulting in routefailure

TCP data and acksen route discarded

Why Does Throughput Degrade?

TCP sendertimes out.Backs off timer.

Route isrepaired

TCP sendertimes out.Resumessending

Larger route repair delaysespecially harmful

No throughput

No throughput

despite route repair

t 2t


UMass

Why Does Throughput Improve?Low Speed Scenario

C

B

D

A

C

B

D

A

C

B

D

A

1.5 second route failure

Route from A to D is broken for ~1.5 second.

When TCP sender times after 1 second, route still broken.

TCP times out after another 2 seconds, and only then resumes.


UMass

Why Does Throughput Improve?Higher (double) Speed Scenario

C

B

D

A

C

B

D

A

C

B

D

A

0.75 second route failure

Route from A to D is broken for ~ 0.75 second.

When TCP sender times after 1 second, route is repaired.


UMass

Why Does Throughput Improve?

General Principle• The previous two slides show a plausible

cause for improved throughput• TCP timeout interval somewhat (not

entirely) independent of speed• Network state at higher speed, when

timeout occurs, may be more favorable than at lower speed

• Network state– Link/route status– Route caches– Congestion


UMass

How to Improve Throughput

(Bring Closer to Ideal)• Network feedback

• Inform TCP of route failure by explicit message

• Let TCP know when route is repaired– Probing– Explicit notification

• Reduces repeated TCP timeouts and backoff


UMass

Performance with Explicit Notification

0

0.2

0.4

0.6

0.8

1

2 10 20 30

mean speed (m/s)

thro

ug

hp

ut

as a

fra

ctio

n o

fid

eal Base TCP

With explicitnotification


UMass

Issues: Network Feedback• Network knows best (why packets

are lost)

+ Network feedback beneficial- Need to modify transport & network

layer to receive/send feedback

• Need mechanisms for information exchange between layers


UMass

Impact of Caching

• Route caching has been suggested as a mechanism to reduce route discovery overhead [Broch98]

• Each node may cache one or more routes to a given destination

• When a route from S to D is detected as broken, node S may:– Use another cached route from local cache, or– Obtain a new route using cached route at

another node


UMass

To Cache or Not to Cache

Average speed (m/s)Ac t

ual t

hrou

ghpu

t (a s

fra

ctio

n of

exp

ecte

d th

roug

hput

)


UMass

Why Performance Degrades With Caching

• When a route is broken, route discovery returns a cached route from local cache or from a nearby node

• After a time-out, TCP sender transmits a packet on the new route.However, the cached route has also broken after it was cached

• Another route discovery, and TCP time-out interval

• Process repeats until a good route is found

timeout dueto route failure

timeout, cachedroute is broken

timeout, second cachedroute also broken


UMass

Issues: To Cache or Not to Cache

• Caching can result in faster route “repair”

• Faster does not necessarily mean correct

• If incorrect repairs occur often enough, caching performs poorly

• Need mechanisms for determining when cached routes are stale


UMass

Caching and TCP performance

• Caching can reduce overhead of route discovery even if cache accuracy is not very high

• But if cache accuracy is not high enough, gains in routing overhead may be offset by loss of TCP performance due to multiple time-outs


UMass

TCP PerformanceTwo factors result in degraded throughput in

presence of mobility:

• Loss of throughput that occurs while waiting for TCP sender to timeout (as seen earlier)– This factor can be mitigated by using explicit

notifications and better route caching mechanisms

• Poor choice of congestion window and RTO values after a new route has been found– How to choose cwnd and RTO after a route

change?


UMass

Issues Window Size After Route Repair• Same as before route break: may be too

optimistic

• Same as startup: may be too conservative

• Better be conservative than overly optimistic– Reset window to small value after route repair– Let TCP figure out the suitable window size– Impact low on paths with small delay-bw product


UMass

Issues: RTO After Route Repair

• Same as before route break– If new route long, this RTO may be too small,

leading to timeouts

• Same as TCP start-up– May be too large– May result in slow response to next packet

loss

• Another plausible approach: new RTO = function of old RTO, old route length, and new route length– Example: new RTO = old RTO * new route

length / old route length– Not evaluated yet– Pitfall: RTT is not just a function of route length


UMass

Out-of-Order Packet Delivery

• Out-of-order (OOO) delivery may occur due to:– Route changes– Link layer retransmissions schemes that deliver OOO

• Significantly OOO delivery confuses TCP, triggering fast retransmit

• Potential solutions:– Deterministically prefer one route over others, even if

multiple routes are known– Reduce OOO delivery by re-ordering received packets

• can result in unnecessary delay in presence of packet loss

– Turn off fast retransmit• can result in poor performance in presence of congestion


UMass

Impact of Acknowledgements• TCP Acks (and link layer acks) share the

wireless bandwidth with TCP data packets

• Data and Acks travel in opposite directions– In addition to bandwidth usage, acks require

additional receive-send turnarounds, which also incur time penalty

– To reduce frequency of send-receive turnaround and contention between acks and data


UMass

Impact of Acks: Mitigation [Balakrishnan97]

• Piggybacking link layer acks with data

• Sending fewer TCP acks - ack every d-th packet (d may be chosen dynamically)

• but need to use rate control at sender to reduce burstiness (for large d)

• Ack filtering - Gateway may drop an older ack in the queue, if a new ack arrives– reduces number of acks that need to be

delivered to the sender

Documents

Courtesy of Nitin Vaidya, UIUC, or Kevin Lai, UC Berkeley, or Jim Kurose, UMass TCP: congestion control and error control Courtesy of Nitin Vaidya, UIUC