20
What Lies Beneath: Understanding Internet Congestion Leiwen Deng Aleksandar Kuzmanovic Northwestern University Bruce Davie, Cisco Systems http:// networks.cs.northwestern.edu

What Lies Beneath: Understanding Internet Congestion Leiwen Deng Aleksandar Kuzmanovic Northwestern University Bruce Davie, Cisco Systems

  • View
    226

  • Download
    2

Embed Size (px)

Citation preview

Page 1: What Lies Beneath: Understanding Internet Congestion Leiwen Deng Aleksandar Kuzmanovic Northwestern University Bruce Davie, Cisco Systems

What Lies Beneath: Understanding Internet

Congestion

Leiwen Deng

Aleksandar Kuzmanovic

Northwestern University

Bruce Davie, Cisco Systems

http://networks.cs.northwestern.edu

Page 2: What Lies Beneath: Understanding Internet Congestion Leiwen Deng Aleksandar Kuzmanovic Northwestern University Bruce Davie, Cisco Systems

2Aleksandar Kuzmanovic What Lies Beneath: Understanding Internet Congestion

Common Wisdom and Our Key Results

No congestion in the Internet core– Links are over-provisioned, hence no congestion

No correlation among congestion events in the Internet– Diversity of traffic and links make large and long-

lasting link congestion dependence unlikely

Our key results– There is a subset of links (both inter-AS and intra-

AS) that exhibit strong congestion intensity– Congestion events in the core can be highly

correlated (up to 3 ASes)

Page 3: What Lies Beneath: Understanding Internet Congestion Leiwen Deng Aleksandar Kuzmanovic Northwestern University Bruce Davie, Cisco Systems

3Aleksandar Kuzmanovic What Lies Beneath: Understanding Internet Congestion

Why Do We Care?

Congestion in the core– Can depend on upon internal network policies or

complex inter-AS relationships– Variable queuing delay can lead to jitter, affecting

VoIP or streaming applications

Correlation– Guidelines for re-routing systems– Most tomography models assume link congestion

independence

Page 4: What Lies Beneath: Understanding Internet Congestion Leiwen Deng Aleksandar Kuzmanovic Northwestern University Bruce Davie, Cisco Systems

4Aleksandar Kuzmanovic What Lies Beneath: Understanding Internet Congestion

Challenges

Scalability– How to concurrently monitor a large number of

Internet links?• Need a light monitoring tool• Need a triggered monitoring system

Our approach– Pong: a light monitoring tool

• Per-path overhead 18 kbps

– TPong: a triggered monitoring system• Capable of monitoring up to 8,000 links concurrently

Page 5: What Lies Beneath: Understanding Internet Congestion Leiwen Deng Aleksandar Kuzmanovic Northwestern University Bruce Davie, Cisco Systems

5Aleksandar Kuzmanovic What Lies Beneath: Understanding Internet Congestion

Congestion Events

Congestion Intensity– How frequently does queue build-ups happen over

30 seconds time scales?

We focus on persistent congestion events:– Intensity > 5%; duration > 2 minutes

Page 6: What Lies Beneath: Understanding Internet Congestion Leiwen Deng Aleksandar Kuzmanovic Northwestern University Bruce Davie, Cisco Systems

6Aleksandar Kuzmanovic What Lies Beneath: Understanding Internet Congestion

Coordinated Probing

S D

Probe

fs

db

4-p probing: a symmetric path scenario

Combines e2e and router-targeted probing

f probe b probe s probe d probe , , ,

Page 7: What Lies Beneath: Understanding Internet Congestion Leiwen Deng Aleksandar Kuzmanovic Northwestern University Bruce Davie, Cisco Systems

7Aleksandar Kuzmanovic What Lies Beneath: Understanding Internet Congestion

Pong: Coordinated Probing

S D

fs

db

Δfs

Δfd

Half-path queuing delay

Locating

Congestion

Points

Tracing

Congestion

Status

Probe

Δd

Δb

Δf

Δs

Page 8: What Lies Beneath: Understanding Internet Congestion Leiwen Deng Aleksandar Kuzmanovic Northwestern University Bruce Davie, Cisco Systems

8Aleksandar Kuzmanovic What Lies Beneath: Understanding Internet Congestion

Pong: Methodology Highlights

Coordinated probing– Send 4, 3, or 2 packets from two endpoints

Quality of Measurability (QoM)– Able to deterministically detect its own inaccuracy

Self-adaptivity– Switch among different probing schemes based on

QoM and path properties

Page 9: What Lies Beneath: Understanding Internet Congestion Leiwen Deng Aleksandar Kuzmanovic Northwestern University Bruce Davie, Cisco Systems

9Aleksandar Kuzmanovic What Lies Beneath: Understanding Internet Congestion

Vantage Point Selection Problem

How to select vantage points to accurately measure congestion at a given link?Link measurability score – How well are we able to measure a specific link

from a specific pair of endpoints; a function of:• Quality of measurability (QoM) for a given node• Queuing-delay threshold quality

• Observability score– Avoid paths that “see” multiple congested links concurrently

Page 10: What Lies Beneath: Understanding Internet Congestion Leiwen Deng Aleksandar Kuzmanovic Northwestern University Bruce Davie, Cisco Systems

10Aleksandar Kuzmanovic What Lies Beneath: Understanding Internet Congestion

Triggered Monitoring System

Paths used Path selection algorithm

Probing method

Probing rate Objective

All paths No selection, full mesh

Low-rate probing

Once every 5 minutes Track topology and path reachability

TMon paths – a subset of all paths

Greedy TMon path selection

Fast-rate probing

5 probes/sec Monitor end-to-end congestion

Pong paths – a subset of TMon paths upon triggering

Priority-based Pong path allocation

Coordinatedprobing

10 probes/sec for e2e probing, 2 probes/sec for router-targeted probing

Locate and monitor link-level congestion

Greedy algorithm to determine a subset of links• Covered 65% (7,800) links with 4.9% (1,750) paths

• Limit the per-node measurement overhead

Priority-based Pong path allocation• Maximize quality of measurability

Page 11: What Lies Beneath: Understanding Internet Congestion Leiwen Deng Aleksandar Kuzmanovic Northwestern University Bruce Davie, Cisco Systems

11Aleksandar Kuzmanovic What Lies Beneath: Understanding Internet Congestion

Coverage & Overhead Statistics

We observe ~ 36,000 paths– N^2, N = 191 nodes– Expose ~ 12,100 links at a time

• Due to routing changes, we are able to observe ~ 29,000 links in total

TMon paths:– Up to 2,000 paths running fast-rate probing concurrently– Cover up to 8,000 links concurrently

• 4.9% paths cover 65% of total links

Pong paths– Up to 30 Pong paths; cover up to 350 links concurrently

Overhead per node:– Average: 30 kbps, Peak: 68 kbps

Page 12: What Lies Beneath: Understanding Internet Congestion Leiwen Deng Aleksandar Kuzmanovic Northwestern University Bruce Davie, Cisco Systems

12Aleksandar Kuzmanovic What Lies Beneath: Understanding Internet Congestion

Measurement Quality

How good is our vantage-point selection algorithm?– Link Measurability Score: 0-6.

• 65% of measurement samples have non-zero score

• 80% of measurements is better than fair• 60% of measurements is better than good

– The key point is that we know how good or bad we are doing

Page 13: What Lies Beneath: Understanding Internet Congestion Leiwen Deng Aleksandar Kuzmanovic Northwestern University Bruce Davie, Cisco Systems

13Aleksandar Kuzmanovic What Lies Beneath: Understanding Internet Congestion

Key Findings

Time-invariant hot spots

Strong spatial correlation among congested links

Root-cause analysis

Page 14: What Lies Beneath: Understanding Internet Congestion Leiwen Deng Aleksandar Kuzmanovic Northwestern University Bruce Davie, Cisco Systems

14Aleksandar Kuzmanovic What Lies Beneath: Understanding Internet Congestion

Time-invariant Hot Spots

Time-of-day effects for the number of congestion events

Small number of links show strong time-invariant congestion intensity

Page 15: What Lies Beneath: Understanding Internet Congestion Leiwen Deng Aleksandar Kuzmanovic Northwestern University Bruce Davie, Cisco Systems

15Aleksandar Kuzmanovic What Lies Beneath: Understanding Internet Congestion

Time-invariant Hot Spots

Most of the links are not inter-continental links as we initially hypothesizedInter-AS links between large backbone networks as well as intra-AS links within these networks

AS # Description174 Cogent Communications, a large Tier-2 ISP.

1299 TeliaNet Global Network, a large Tier-2 ISP.2096

5GEANT, a main European multi-gigabit computer network for research and education purposes, Tier-2.

4323 Time Warner Telecom, a Tier-2 ISP in US.3356 Level 3 Communications, a Tier-1 ISPs.237 Merit, a Tier-2 network in US.

6461 Abovenet Communications, a large Tier-2 ISP.2775

0RedCLARA, a backbone connects the Latin-American National Research and Education Networks to Europe.

6453 Teleglobe, a Tier-2 ISP.2914 NTT America, a Tier-1 ISPs.3549 Global Crossing, a Tier-1 ISPs.1153

7Abilene, an Internet2 backbone network in US.

4538 China Education and Research Network.

Page 16: What Lies Beneath: Understanding Internet Congestion Leiwen Deng Aleksandar Kuzmanovic Northwestern University Bruce Davie, Cisco Systems

16Aleksandar Kuzmanovic What Lies Beneath: Understanding Internet Congestion

Pair-wise correlation– Percent of time 2 links are concurrently congested

– Pair-wise correlation can be quite extensive • E.g., 20% of pairs has correlation greater than 0.7

– Correlation: weekend > weekdays • Overall congestion level smaller during weekends

– Distance between correlated link pairs • up to 3 ASes

Congestion Correlation

Page 17: What Lies Beneath: Understanding Internet Congestion Leiwen Deng Aleksandar Kuzmanovic Northwestern University Bruce Davie, Cisco Systems

17Aleksandar Kuzmanovic What Lies Beneath: Understanding Internet Congestion

Hypothesis:– When upstream traffic converges to a relatively thin

aggregation point, then traffic surges in an upstream link are likely to create congestion at a thin downstream aggregation link

Insights:– Aggregation points correspond to time-invariant hot spots– Interaction between an aggregation point and an upstream

link causes link-level correlation

Aggregation Effect Hypothesis

Aggregation link

Page 18: What Lies Beneath: Understanding Internet Congestion Leiwen Deng Aleksandar Kuzmanovic Northwestern University Bruce Davie, Cisco Systems

18Aleksandar Kuzmanovic What Lies Beneath: Understanding Internet Congestion

Root-cause Analysis: Example

10Gbps

622Mbps

Page 19: What Lies Beneath: Understanding Internet Congestion Leiwen Deng Aleksandar Kuzmanovic Northwestern University Bruce Davie, Cisco Systems

19Aleksandar Kuzmanovic What Lies Beneath: Understanding Internet Congestion

Final Statistics

Rank Network Peers1 UUNET 2,3462 AT&T WorldNet 2,0923 Level 3 Comm. 1,7425 Cogent Comm. 1,6427 Global Crossing 1,0418 Time Warner 9189 Abovenet 798

Rank ISP1 Level 3 Comm.2 UUNET3 AT&T WorldNet6 Cogent Comm.9 Global Crossing

Rank ISP1 Level 3 Comm.2 TeliaNet Global Network4 Global Crossing8 Teleglobe

Rank ISP2 NTT America6 UUNET8 AT&T WorldNet9 Level 3 Comm.

10 Teleglobe

Table 1: Matched locations in the top ten networks defined by the number of peers

Table 2: Matched locations in the top ten ISPs that most aggressively promote customer access

North America Europe Asia

Page 20: What Lies Beneath: Understanding Internet Congestion Leiwen Deng Aleksandar Kuzmanovic Northwestern University Bruce Davie, Cisco Systems

20Aleksandar Kuzmanovic What Lies Beneath: Understanding Internet Congestion

Conclusions

Triggered monitoring system – Measuring congestion in a scalable way– Key feature:

• Select vantage points to measure congestion as a function of the measurement quality

Key findings– A subset of links experience time-invariant high

congestion intensity– There is strong correlation among congestion

events at different links (up to 3 ASes)– Root cause: aggregation effect

• some links thinner than others