Upload
alvaro
View
19
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Congestion Control. Internet Traffic Engineering. Measurement: for reality check Experiment: for Implementation Issues Analysis: Bring fundamental understanding of systems May loose important facts because of simplification Simulation: - PowerPoint PPT Presentation
Citation preview
Congestion Control
Internet Traffic Engineering
Measurement: for reality check Experiment: for Implementation Issues Analysis:
Bring fundamental understanding of systems May loose important facts because of
simplification Simulation:
Complementary to analysis: Correctness, exploring complicate model
May share similar model to analysis
What is congestion ? What is congestion ?
The aggregate demand for bandwidth exceeds the available capacity of a link.
What will be occur ? Performance Degradation
• Multiple packet losses• Low link utilization (low Throughput)• High queueing delay• Congestion collapse
What is congestion ? – 2
Congestion Control
Open-loop control Mainly used in circuit switched network (GMPLS)
Closed-loop control Mainly used in packet switched network Use feedback information: global & local
Implicit feedback control End-to-end congestion control Examples:TCP Tahoe, TCP Reno, TCP Vegas, etc.
Explicit feedback control Network-assisted congestion control Examples:IBM SNA, DECbit, ATM ABR, ICMP source quench, RED, ECN
Congestion Control and Avoidance
Two approaches of handling Congestion
Congestion Control (Reactive)• Play after the network is overloaded
Congestion Avoidance (Proactive)• Play before the network becomes
overloaded
Open-loop control --- congestion avoidance
source establishes traffic descriptor with network describing its needs
net typically reserves resources and performs enforcement: admission control for new connections shaping or policing at edges for data
challenges: choosing the traffic descriptor, choosing scheduling discipline at routers, performing admission control
Implicit vs. Explicit feedback
Implicit feedback Congestion Control
Network drops packets when congestion occur
Source infers congestion implicitly• time-out, duplicated ACKs, etc.
Example: end-to-end TCP congestion Control
Simple to implement but inaccurate • implemented only at transport layer (e.g., TCP)
Implicit vs. Explicit feedback - 2
Explicit feedback Congestion Control Network component (e.g., router) provides
congestion indication explicitly to sources• use packet marking, or RM cells (in ATM ABR
control) Examples: DECbit, ECN, ATM ABR CC, etc. Provide more accurate information to sources But is more complicate to implement
• Need to change both source and network algorithm• Need cooperation between sources and network
component
TCP Congestion Control
Uses end-to-end congestion control uses implicit feedback
• e.g., time-out, triple duplicated ACKs, etc. uses window based flow control
• cwnd = min (pipe size, rwnd)• self-clocking (ACKs pace transmission)• slow-start and congestion avoidance
Examples:• TCP Tahoe, TCP Reno, TCP Vegas, etc.
Congestion routers receive packets at a rate faster
than the routers can process newly arriving packets are dropped network congested
if a packet is lost, the source re-transmits all sources do the same causes even more congestion (congestion collapse !)
solution : slow down the sources! how to know when to slow-down ? by how much ?
Congestion
knee – point after which
throughput increases slowly delay increases fast
cliff – point after which
throughput starts to decrease fast to zero
delay approaches infinity
load
load
thro
ug
hpu
td
ela
y
knee cliff
congestioncollapse
packetloss
congestionavoidance
congestioncontrol
Goals sender operates near knee point
source should not put a new packet into network until another packet leaves how ? use ACKs ! i.e. send a new packet only after receiving an ACK (self-clocking) maintain the number of packets in the network constant
Self-clocking
PrPb
Ar
Ab
ReceiverSender
As
TCP Congestion Control
TCP-sender maintains three variables cwnd – congestion window rcv_win – receiver advertised window ssthresh – slow start threshold (used to
update cwnd, intuitively ssthresh is a rough estimate of the knee point)
send_win = min (rcv_win, cwnd)
TCP Tahoe implements
slow start congestion avoidance fast retransmit algorithm
Slow Start (Simplified)
(initially) cwnd =1*Max Segment Size (MSS)
each time an ACK received for a segmentcwnd += 1* MSS (exponential growth of cwnd !)
if loss (i.e. timeout), cwnd = 1*MSS again
Congestion Avoidance (Simplified)
for each ACK received cwnd += ( MSS*MSS/ cwnd) approximation of increasing the cwnd by 1*MSS per RTT (additive increase).
if loss(i.e. timeout), cut the cwnd by half (multiplicative decrease).
Slow Start & Congestion Avoidance
ssthresh
• initally:
cnwd = 1*MSS, ssthresh = very high
• if a new ACK comes:
- if cnwd < ssthresh update cwnd according to slow start
- if cwnd >= ssthresh update cnwd according to congestion avoidance
• if timeout (i.e. loss) :
- ssthresh = send_win/2;
- cwnd = 1*MSS
time
cwnd
timeout (loss)
slow start – in green
congestion avoidance – in blue
(initial) ssthresh
Eight ACKs
assume (initial) ssthresh = 8*MSS
Example: Slow Start/Congestion Avoidance
cwnd = 10
cwnd = 2
cwnd = 4
cnwd = 8
cwnd = 1
cwnd = 9
Eight TCP-PDUs
nineACKs
nine TCP-PDUs
ten ACKs
ten TCP-PDUs
cwnd = 11
0
2
4
6
8
10
12
1 2 3 4 5 6 7
transmission number
con
ges
tio
n w
ind
ow
siz
e (i
n M
SS
)
ssthresh
Fast Retransmit sender receives
3 dupACKS sender infers that the segment is lost sender doesn’t wait for timeout sender re-sends the segment immediately!
ACK 1
segment 1cwnd = 1
cwnd = 2 segment 2segment 3
ACK 3
cwnd = 4 segment 4segment 5segment 6segment 7
ACK 2
3 duplicateACKs ACK 3
ACK 3
ACK 3
segment 4
fast-retransmit
of segment 4
Time
Sequence NoX
TCP Versions: Tahoe
dataack
fast-retransmit
after fast-retransmit
sshtresh = send_win/2;
cnwd = 1*MSS ;
i.e. sender goes back to slow-start !
TCP Reno implements
slow start congestion avoidance fast retransmit algorithm & fast
recovery
Fast Recovery
intuition: receipt of dupACKs tells to the sender that the receiver is still getting new segments, i.e. there is still data flow between sender and receiver then why sender goes back to slow start after fast retransmit
cwnd
Slow Start Congestion AvoidanceTime
“inflating” cwnd with dupACKs
“deflating” cwnd with a new ACK
new ACK
fast-retransmit
fast-retransmit
new ACK
timeout
(initial) ssthresh
sender does the following after receiving 3 dupACKS:
1. sets sshresh = send_win/22. retransmits the lost segment 3. sets cwnd = sshthresh + 3*MSS4. for each dupACK received cwnd += 1*MSS
(“inflating” cwnd)5. if a newACK arrives cwnd = sshresh (value in step
1) (“deflating” cwnd) , and exit fast recovery .
remember: if sender times out, ssthresh = send_win/2, cnwd =1 ! (that is go back to slow start again!)
Fast Re-transmit & Fast Recovery
TCP New Reno implements
slow start congestion avoidance fast retransmit & modified fast
recovery
motivation: fast recovery (as in Reno) can not recover from multiple losses within the same window efficiently.
Modified Fast Recovery
Sequence No
X
X
XX
Now what ? - timeout
TCP Reno – with multiple losses within the same window
dataack
Time
NewReno
Time
Sequence No
X
X
XX
Now what ? – partial ack
recovery
dataack
Modifications to fast recovery
partial ACKs (i.e. the ACK that acks some but not all the packets that were outstanding at the start of fast recovery) : indications of multiple losses
if partial ACK received, re-transmit the next lost segment immediately (whereas in Reno, partial ACKs take TCP out of fast recovery).
sender remains in fast recovery until all data outstanding when fast recovery was initiated is acked.
Explicit Congestion Notification (ECN)
Current congestion indication Use packet drop to indicate congestion Sources infer congestion implicitly from timeout or triple
duplicate ACKs
ECN [IETF RFC2481, 1999] To give less packet drop and better performance
Uses packet marking rather than dropping Reduces long timeout and retransmission
Needs cooperation between sources and network Sources must indicate that they are ECN-capable Sources and receivers must agree to use ECN Receiver must inform sources of ECN marks Sources must react to marks just like losses
ECN - 2 Needs additional flags in TCP header and IP
header In IP header: ECT and CE
ECN Capable Transport (ECT): Set by sources on all packets to indicate ECN-capability
Congestion Experienced (CE): Set by routers as a (congestion) marking (instead of
dropping)
In TCP header: ECE and CWR Echo Congestion Experienced (ECE):
When a receiver sees CE, sets ECE on all packets until CWR is received
Congestion Window Reduced (CWR): Set by a source to indicate that ECE was received and the
window size was adjusted (reduced)
ECN - 3
1
TCP Header
ECT CE
1 0IP Header
CWR
0
ECT CE
CWR
2
1 1
0
3
ACK TCPHeader
ECN-Echo
1
4
TCP Header
CWR
1
Source Router Destination
Active Queue Management (AQM) - 1
Performance Degradation in current TCP Congestion Control Multiple packet loss Low link utilization Congestion collapse
The role of the router becomes important Control congestion effectively in networks Allocate bandwidth fairly
AQM - 2
Problems with current router algorithm Use FIFO based tail-drop (TD) queue management Two drawbacks with TD: lock-out, full-queue
Lock-out: a small number of flows monopolize usage of buffer capacity Full-queue: The buffer is always full (high queueing delay)
Possible solution: AQM Definition: A group of FIFO based queue management
mechanisms to support end-to-end congestion control in the Internet
AQM - 3 Goals of AQM
Reducing the average queue length: Decreasing end-to-end delay
Reducing packet losses: More efficient resource allocation
Methods: Drop packets before buffer becomes full Use (exponentially weighted) average queue
length as an congestion indicator Examples: RED, BLUE, ARED, SRED, FRED,
….
RED-IntroductionMain idea:: to provide congestion control
at the router for TCP flows. RED Algorithm Goals
The primary goal is to provide congestion avoidance by controlling the average queue size such that the router stays in a region of low delay and high throughput.
To avoid global synchronization (e.g., in Tahoe TCP).
To control misbehaving users (this is from a fairness context).
To seek a mechanism that is not biased against bursty traffic.
RED-Definitions congestion avoidance – when impending
congestion is indicated, take action to avoid congestion.
incipient congestion – congestion that is beginning to be apparent.
need to notify connections of congestion at the router by either marking the packet [ECN] or dropping the packet {This assumes a drop is an implied signal to the source host.}
RED-Previous Work
Drop Tail Random Drop Early Random Drop Source Quench messages DECbit scheme
RED-Drop Tail Router
• FIFO queueing mechanism that drops packets when the queue overflows.
• Introduces global synchronization when packets are dropped from several connections.
RED-Random Drop Router
• When a packet arrives and the queue is full, randomly choose a packet from the queue to drop.
RED-Early Random Drop Router
• If the queue length exceeds a drop level, then the router drops each arriving packet with a fixed drop probability.
• Reduces global synchronization
• Does not control misbehaving users (UDP)
?
Drop level
RED-Source Quench messages
Router sends source quench messages back to source before queue reaches capacity.
Complex solution that gets router involved in end-to-end protocol.
RED-DECbit scheme
Uses a congestion-indication bit in packet header to provide feedback about congestion.
Average queue length is calculated for last (busy + idle) period plus current busy period.
When average queue length exceeds one, set congestion-indicator bit in arriving packet’s header.
If at least half of packets in source’s last window have the bit set, decrease the congestion window exponentially.
RED Algorithmfor each packet arrival
calculate the average queue size avgif minth <= avg < maxth
calculate the probability pa
with probability pa:
mark the arriving packetelse if maxth <= avg
mark the arriving packet
RED drop probability ( pa )
pb = maxp x (avg - minth)/(maxth - minth) [1]
wherepa = pb/ (1 - count x pb) [2]
Note: this calculation assumes queue size is measured in packets. If queue is in bytes, we need to add [1.a] between [1] and [2]
pb = pb x PacketSize/MaxPacketSize [1.a]
avg - average queue length
avg = (1 – wq) x avg + wq x q
where q is the newly measured queue length.
This exponential weighted moving average is designed such that short-term increases in queue size from bursty traffic or transient congestion do not significantly increase average queue size.
RED/ECN Router Mechanism
1
0
Average Queue Length
Minth Maxth
Dropping/Marking Probability
Queue Size
maxp
RED parameter settings wq suggest 0.001 <= wq <= 0.0042
authors use wq = 0.002 for simulations minth, maxth depend on desired average queue size
bursty traffic increase minth to maintain link utilization.
maxth depends on the maximum average delay allowed.
RED is most effective when average queue size is larger than typical increase in calculated queue size in one round-trip time.
“parameter setting rule of thumb”: maxth at least twice minth . However, maxth = 3 times minth is used in some of the experiments shown.
packet-marking probability goal: To uniformly spread out the marked
packets. This reduces global synchronization.
Method 1: geometric random variable each packet marked with probability pb
Method 2: uniform random variable marking probability is pb/ (1 - count x pb)
where count is the number of unmarked packets arrived since last marked packet.
Method 1: geometric p = 0.02
Method 2: uniform
Result :: marked packets more clustered for method 1 uniform is better at eliminating “bursty drops”
Setting maxp
“ RED performs best when packet-marking probability changes fairly slowly as the average queue size changes.” This is a stability argument in that the claim is
that RED with small maxp will reduce oscillations in avg and actual marking probability.
They recommend that maxp never be greater than 0.1
{This is not a robust recommendation}.
Evaluation of RED meeting design goals congestion avoidance
If RED drops packets, this guarantees the calculated average queue size does not exceed the max threshold. If wq set properly, RED controls actual average queue size.
If RED marks packets, router relies on source cooperation to control average queue size.
Evaluation of RED meeting design goals
appropriate time scales detection time scale roughly matches
time scale of response to congestion RED does not notify connections
during transient congestion at the router.
Evaluation of RED meeting design goals no global synchronization
avoids global synchronization by marking at as low a rate as possible with distribution spread out
simplicity detailed argument about how to cheaply
implement in terms of adds and shifts. {Historically, this argument has been
strongly refuted because RED has too many parameters to make it robust.}
Evaluation of RED meeting design goals
maximizing global power power defined as ratio of throughput to
delay fairness
authors claim not well-defined {This is an obvious side-step of this
issue.} [later this becomes big deal -see FRED
paper]
Conclusions
RED is effective mechanism for congestion avoidance at the router in cooperation with TCP.
claim: probability that RED chooses a particular connection to notify during congestion is roughly proportional to that connection’s share of the bandwidth.
BLUE
Concept To avoid drawbacks of RED
Parameter tuning problem Actual queue length fluctuation
Decouple congestion control from queue length
Use only loss and idle event as an indicator Maintains a single drop prob., pm
Drawback Can not avoid some degree of multiple packet
loss and/or low utilization
BLUE’s Algorithm (I)
Upon Packet loss ( or Qlen > L) event:
if (now – last_update ) > freeze_time ) then pm = pm + d1
last_update = nowUpon link idle event:
if (now- last_update) > freeze_time ) then pm = pm + d2
last_update = now
BLUE’s Algorithm (II) Update trigger events:
Packet loss – increase dropping probability
Link idle – decrease dropping probability
Parameters: freeze_time: update frequency, could be
randomized to avoid global synchronization
BLUE’s Algorithm (III)
Parameters: d1: increment step d2: decrement step d1 is signification larger than d2
– backoff more aggressively
Discussions concerning AQM
Problems with existing AQM Proposals Mismatch between macroscopic and
microscopic behavior of queue length Insensitivity to the change of input traffic
load Configuration (parameter setting) problem
Reasons: Queue length averaging use inappropriate congestion indicator Use inappropriate control function