View
226
Download
2
Embed Size (px)
Citation preview
What Lies Beneath: Understanding Internet
Congestion
Leiwen Deng
Aleksandar Kuzmanovic
Northwestern University
Bruce Davie, Cisco Systems
http://networks.cs.northwestern.edu
2Aleksandar Kuzmanovic What Lies Beneath: Understanding Internet Congestion
Common Wisdom and Our Key Results
No congestion in the Internet core– Links are over-provisioned, hence no congestion
No correlation among congestion events in the Internet– Diversity of traffic and links make large and long-
lasting link congestion dependence unlikely
Our key results– There is a subset of links (both inter-AS and intra-
AS) that exhibit strong congestion intensity– Congestion events in the core can be highly
correlated (up to 3 ASes)
3Aleksandar Kuzmanovic What Lies Beneath: Understanding Internet Congestion
Why Do We Care?
Congestion in the core– Can depend on upon internal network policies or
complex inter-AS relationships– Variable queuing delay can lead to jitter, affecting
VoIP or streaming applications
Correlation– Guidelines for re-routing systems– Most tomography models assume link congestion
independence
4Aleksandar Kuzmanovic What Lies Beneath: Understanding Internet Congestion
Challenges
Scalability– How to concurrently monitor a large number of
Internet links?• Need a light monitoring tool• Need a triggered monitoring system
Our approach– Pong: a light monitoring tool
• Per-path overhead 18 kbps
– TPong: a triggered monitoring system• Capable of monitoring up to 8,000 links concurrently
5Aleksandar Kuzmanovic What Lies Beneath: Understanding Internet Congestion
Congestion Events
Congestion Intensity– How frequently does queue build-ups happen over
30 seconds time scales?
We focus on persistent congestion events:– Intensity > 5%; duration > 2 minutes
6Aleksandar Kuzmanovic What Lies Beneath: Understanding Internet Congestion
Coordinated Probing
S D
Probe
fs
db
4-p probing: a symmetric path scenario
Combines e2e and router-targeted probing
f probe b probe s probe d probe , , ,
7Aleksandar Kuzmanovic What Lies Beneath: Understanding Internet Congestion
Pong: Coordinated Probing
S D
fs
db
Δfs
Δfd
Half-path queuing delay
Locating
Congestion
Points
Tracing
Congestion
Status
Probe
Δd
Δb
Δf
Δs
8Aleksandar Kuzmanovic What Lies Beneath: Understanding Internet Congestion
Pong: Methodology Highlights
Coordinated probing– Send 4, 3, or 2 packets from two endpoints
Quality of Measurability (QoM)– Able to deterministically detect its own inaccuracy
Self-adaptivity– Switch among different probing schemes based on
QoM and path properties
9Aleksandar Kuzmanovic What Lies Beneath: Understanding Internet Congestion
Vantage Point Selection Problem
How to select vantage points to accurately measure congestion at a given link?Link measurability score – How well are we able to measure a specific link
from a specific pair of endpoints; a function of:• Quality of measurability (QoM) for a given node• Queuing-delay threshold quality
• Observability score– Avoid paths that “see” multiple congested links concurrently
10Aleksandar Kuzmanovic What Lies Beneath: Understanding Internet Congestion
Triggered Monitoring System
Paths used Path selection algorithm
Probing method
Probing rate Objective
All paths No selection, full mesh
Low-rate probing
Once every 5 minutes Track topology and path reachability
TMon paths – a subset of all paths
Greedy TMon path selection
Fast-rate probing
5 probes/sec Monitor end-to-end congestion
Pong paths – a subset of TMon paths upon triggering
Priority-based Pong path allocation
Coordinatedprobing
10 probes/sec for e2e probing, 2 probes/sec for router-targeted probing
Locate and monitor link-level congestion
Greedy algorithm to determine a subset of links• Covered 65% (7,800) links with 4.9% (1,750) paths
• Limit the per-node measurement overhead
Priority-based Pong path allocation• Maximize quality of measurability
11Aleksandar Kuzmanovic What Lies Beneath: Understanding Internet Congestion
Coverage & Overhead Statistics
We observe ~ 36,000 paths– N^2, N = 191 nodes– Expose ~ 12,100 links at a time
• Due to routing changes, we are able to observe ~ 29,000 links in total
TMon paths:– Up to 2,000 paths running fast-rate probing concurrently– Cover up to 8,000 links concurrently
• 4.9% paths cover 65% of total links
Pong paths– Up to 30 Pong paths; cover up to 350 links concurrently
Overhead per node:– Average: 30 kbps, Peak: 68 kbps
12Aleksandar Kuzmanovic What Lies Beneath: Understanding Internet Congestion
Measurement Quality
How good is our vantage-point selection algorithm?– Link Measurability Score: 0-6.
• 65% of measurement samples have non-zero score
• 80% of measurements is better than fair• 60% of measurements is better than good
– The key point is that we know how good or bad we are doing
13Aleksandar Kuzmanovic What Lies Beneath: Understanding Internet Congestion
Key Findings
Time-invariant hot spots
Strong spatial correlation among congested links
Root-cause analysis
14Aleksandar Kuzmanovic What Lies Beneath: Understanding Internet Congestion
Time-invariant Hot Spots
Time-of-day effects for the number of congestion events
Small number of links show strong time-invariant congestion intensity
15Aleksandar Kuzmanovic What Lies Beneath: Understanding Internet Congestion
Time-invariant Hot Spots
Most of the links are not inter-continental links as we initially hypothesizedInter-AS links between large backbone networks as well as intra-AS links within these networks
AS # Description174 Cogent Communications, a large Tier-2 ISP.
1299 TeliaNet Global Network, a large Tier-2 ISP.2096
5GEANT, a main European multi-gigabit computer network for research and education purposes, Tier-2.
4323 Time Warner Telecom, a Tier-2 ISP in US.3356 Level 3 Communications, a Tier-1 ISPs.237 Merit, a Tier-2 network in US.
6461 Abovenet Communications, a large Tier-2 ISP.2775
0RedCLARA, a backbone connects the Latin-American National Research and Education Networks to Europe.
6453 Teleglobe, a Tier-2 ISP.2914 NTT America, a Tier-1 ISPs.3549 Global Crossing, a Tier-1 ISPs.1153
7Abilene, an Internet2 backbone network in US.
4538 China Education and Research Network.
16Aleksandar Kuzmanovic What Lies Beneath: Understanding Internet Congestion
Pair-wise correlation– Percent of time 2 links are concurrently congested
– Pair-wise correlation can be quite extensive • E.g., 20% of pairs has correlation greater than 0.7
– Correlation: weekend > weekdays • Overall congestion level smaller during weekends
– Distance between correlated link pairs • up to 3 ASes
Congestion Correlation
17Aleksandar Kuzmanovic What Lies Beneath: Understanding Internet Congestion
Hypothesis:– When upstream traffic converges to a relatively thin
aggregation point, then traffic surges in an upstream link are likely to create congestion at a thin downstream aggregation link
Insights:– Aggregation points correspond to time-invariant hot spots– Interaction between an aggregation point and an upstream
link causes link-level correlation
Aggregation Effect Hypothesis
Aggregation link
18Aleksandar Kuzmanovic What Lies Beneath: Understanding Internet Congestion
Root-cause Analysis: Example
10Gbps
622Mbps
19Aleksandar Kuzmanovic What Lies Beneath: Understanding Internet Congestion
Final Statistics
Rank Network Peers1 UUNET 2,3462 AT&T WorldNet 2,0923 Level 3 Comm. 1,7425 Cogent Comm. 1,6427 Global Crossing 1,0418 Time Warner 9189 Abovenet 798
Rank ISP1 Level 3 Comm.2 UUNET3 AT&T WorldNet6 Cogent Comm.9 Global Crossing
Rank ISP1 Level 3 Comm.2 TeliaNet Global Network4 Global Crossing8 Teleglobe
Rank ISP2 NTT America6 UUNET8 AT&T WorldNet9 Level 3 Comm.
10 Teleglobe
Table 1: Matched locations in the top ten networks defined by the number of peers
Table 2: Matched locations in the top ten ISPs that most aggressively promote customer access
North America Europe Asia
20Aleksandar Kuzmanovic What Lies Beneath: Understanding Internet Congestion
Conclusions
Triggered monitoring system – Measuring congestion in a scalable way– Key feature:
• Select vantage points to measure congestion as a function of the measurement quality
Key findings– A subset of links experience time-invariant high
congestion intensity– There is strong correlation among congestion
events at different links (up to 3 ASes)– Root cause: aggregation effect
• some links thinner than others