1
End-to-End Detection of Shared Bottlenecks
Sridhar Machiraju and Weidong Cui
Sahara Winter Retreat 2003
2
Problem Statement
• Given 2 end-to-end flows f1 and f2, do they share a bottleneck (a congested link i.e., link with packet drops)
(OR)
• Given 2 routes R1 and R2 on the Internet, do they share a bottleneck link?
3
Why is this hard?
• No information from the network• Only information available – delay and
drops. • Lots of noise – delay from intermediate
links and drops on other links• Bottlenecks may change over time
4
Why solve this problem?
• Overlays – – RON - Decide if rerouting flows bypasses
congestion points or not– RON – Does such rerouting affect existing
flows? Which ones?– Cooperative overlays – overlay does not
want to share bottleneck with a “friendly overlay”
– OverQoS – Useful to cluster together overlay links based on shared bottlenecks
5
Why solve this problem (cont.)?
• Other applications– Massive backups of data from different
servers – do them in parallel?– Content distribution – is the use of multipath
going to improve performance?– Kazaa – parallel downloads from peers– Multihomed ASs can evaluate the
“orthogonality” in terms other than fault-tolerance
6
Related Work• Past work done only with Y or Inverted-Y
topologies using Poisson probes, packet pairs and inter-arrival times.
Receivers
Senders
7
Goals
• Provide a general solution for double-Y topology
• Work with multiple bottlenecks and provide an indicator of shared congestion
• Be able to use active probe flows and also passively observed (TCP) flows
• Complexity issues for clustering flows
8
Motivation of Our Techniques
• Droptail queues + TCP – queues exhibit bursty loss periods + no losses
• Queues build-up until bursty losses and decrease in sizes before increasing again
• Provides motivation for correlating periods of drops and delays (proportional to queue sizes)
• But…
9
Synchronization Lag
0
T
d1 d2+
Flow 1
Flow 2
Time
Sender 1
Sender 2
0 1 2 3 4 5 6 7 8
0 1 2 3 4 5 6 7
0 1 2 3 4 5 6
0 1 2 3 4
Synchronization Lag = 3T
Note: is bounded by RTTmax/2
10
Overview of Our Techniques
• We propose 2 techniques – – Probability Distribution (PD) technique – Cross-Correlation (CC) technique
• PD is based on getting the peak of the discrete probability distribution of, minimum time between drop of a flow and drop of the other
• CC is based on getting the maximum cross-correlation assuming various synch. lags
11
PD Technique
• For each dropped packet of a flow, plot PD of minimum of the time differences between its sending time and the sending times of dropped packets of other flow
• If shared bottleneck, we expect (ideally) a 1 at d2- d1+ ; All flows may not see drops during same burst, so use threshold < 1 for peak
• We may see more than 1 drop in a burst; cluster drops into bursts and use time differences between starts of bursts
12
PD technique (contd.)
• Robustness issues: synch. lag must be smaller than the time difference between consecutive drops of a flow
Delay1
Delay2
Packet Loss
13
Cross-Correlation (CC) Technique
• Key ideas– Two “back-to-back” packets from two
different flows will experience similar packet drop/delay at the bottleneck
– If we can generate two sequences of “back-to-back” packets from two different flows, then we can calculate their cross-correlation coefficient of losses or delays to measure their “similarity”.
– If the cross-correlation coefficient is greater than some threshold, then the two flows share a bottleneck.
Network
14
Questions about the CC Technique
• How to generate two sequences of “back-to-back” packets?– UDP probes with a constant interval T
• average interval <= T/2
– Shift the sequence to overcome the synch. lag
• How long should the two sequences be to get a significant result?– When the CC coefficient becomes relatively stable– But no less than a minimum period of time
• What should the threshold be?– Use 0.1 in the experiments– Why 0.1?
15
Overcome the Synchronization Problem
Delay1
Delay2
Shift 2 packets
Packet Loss
• Find the max cross-correlation by shifting one of the two sequences within some range
• The value of the optimal shift is an estimation of the synchronization lag.
16
Wide-Area Experiments
• Challenges– Access to hosts distributed globally?– How to verify our experimental results?
• Solutions– PlanetLab (http://www.planet-lab.org)– Set up an overlay network with double-Y
topology– Application-level routers monitor losses and
delays
17
Topology with Shared Bottleneck (I)
Vancouver
Seattle Wisc
Atlanta
Bologna
Sydney
18
Topology without Shared Bottleneck (II)
Vancouver
Seattle Wisc
Atlanta
Bologna
Sydney
19
Experimental Setup
• Active Probing– 40 bytes per packet– Every 10ms
• Log packet arrival times on every node– Also can get information of losses from these
logs
• Traces from 10mins to 60mins• Threshold = 0.1 for the PD and CC
techniques
20
Overall ResultsExp # Packet Drops PD Technique
Loss CC Technique
Delay CC Technique
shared
Non-shared
Peak Value
Est. Lag
CC Coeff.
Est. Lag
CC Coeff.
Est. Lag
1(20mins)
3 2096 < 0.1 - < 0.1 - < 0.1 -
2(10mins)
6772 165 0.21 60ms 0.22 50ms 0.12 50ms
3(10mins)
2070 32 0.45100m
s0.81 80ms < 0.1 -
4(10mins)
81 2252 < 0.1 - 0.38 -1.17s 0.99 -1.17s
5(30mins)
0 5565 < 0.1 - < 0.1 - < 0.1 -
6(60mins)
10272 1127 <0.1 - 0.23 6s < 0.1 -
7(10mins)
1592 57 < 0.1 - 0.75 -1.15 < 0.1 -
8(10mins)
1895 112 0.11180m
s0.55
300ms
< 0.1 -
Failed Cases
21
Why the Delay CC Technique fails?
• Delay spikes at the non-shared part.
22
Why the PD Technique fails?
• Large synchronization lag• Few number of drops at the bottleneck
23
Open Issues
• Parameter Selection– What should the thresholds be?
• Active vs. Passive Probing– Active probing: waste network resources– Passive probing: cannot control the size/rate of the
probing sequences.
• Multiple Bottlenecks– Our techniques are not limited to the cases of single
bottlenecks.– But need more quantitative evaluations
• Probability of sharing a bottleneck– How often should we generate probing sequence to
detect if two flows share a bottleneck?– Can we give a probability rather than a 0-1 decision?
24
Conclusions
• Problem– Detect if 2 end-to-end flows share a bottleneck
• Challenge– Synchronization lag in double-Y topology
• Techniques– The Probability Distribution Technique– The Loss/Delay Cross-Correlation Technique
• Experimental Results– The Loss CC technique succeeds with all experiments– The Delay CC technique fails in some experiments
due to delay spikes at the non-shared part – The PD technique fails in some experiments due to
large synch. Lag and few number of losses at the bottleneck