Boston University Computer Science On The Marginal Utility of Network Topology Measurements John...

Preview:

Citation preview

Boston University Computer Science

On The Marginal Utility of

Network Topology Measurements

John Byers

with

Paul Barford (now at Wisconsin),

Azer Bestavros, and Mark Crovella

Measurement Philosophy

Current Dogma: When conducting a wide-area measurement study: “more is better.” More = measurements More = measurement sites

True, but taking more measurements and deploying more infrastructure is expensive!

Our focus: How much better is more? Even harder: When can we stop measuring?

Not much work on this topic in our community.

Problem Instance:Discovering Internet Topology

Typical goal: discover the router-level Internet graph

Typical approach: merge lists of known nodes and edges

Traceroute reports the IP path from A to B i.e., how IP paths are overlaid on the router

graph

Traceroute studies

Yield overlays of projections from S’s to D’s Sources: active, expensive Destinations: passive, cheap

SS

D

D DD

D

Motivating Questions

How should we use traceroute and what can it discover? Physical topology (nodes, links)? IP routing topology?

What’s a good way to organize a collection-of-traceroutes study? Many sources? Many destinations? How much is enough?

Theoretical Inroads

Take a graph G = (V, E) and a routing algorithm R. Choose j sources and k destinations at random. Consider the subgraph G’ = (V’, E’) induced by

routes from R between all (S, D) pairs. How do expected values of |V’| and |E’| scale as a

function of j and k ? Chuang-Sirbu scaling law is special case for j = 1.

Marginal utility of adding k+1 ’st source or destination is expected contribution to |V’| or |E’|.

What might we expect?

Two extremal cases: Clique: each new (S, D) discovers a new path Star: each new S or D discovers only a small

neighborhood

D

D D

D

D

D

D D

D

DS

S SS

Clique Star

Skitter to the Rescue

Two datasets from CAIDASmall dataset: May 2000

8 sources, 1277 destinations, 20K paths Sources in: New Zealand, Japan, Singapore, San

Jose (2), Ottawa, London, Washington All sources traced to all destinations

Large dataset: October 2000, 30 times bigger 12 sources, 313709 destinations, 600K paths No destination common to all sources, or vice versa

Interface Disambiguation

Traceroutes report only on interfaces used Routers often have multiple interfaces But merging traceroutes requires matching routers

Solution: probe each interface from some site X Routers are supposed to respond on the interface

used for routing to X

Results in set of (probe interface, response interface) pairs Each connected component is taken to be a router

Classifying Nodes

Core, border, stub, leafSolely from traceroute information

Leaf Border Core Stub

Classification depends on msmts

Core

Stub

Border

Limitations and Caveats

Interface disambiguation 13% of interfaces never responded

Node classification Identifying a border node requires two paths to it

Representativeness Datasets are small, may not be representative Skitter sources not selected at random

Unknown coverage of true network Diminishing returns may not signify good coverage

Diminishing Returns (Small Dataset)

Diminishing Returns (Large Dataset)

Diminishing returns by Classification (Small Dataset)

Core

Stub

Border

What Does This Suggest?

D

D D

D

D

DS

S

Adding Destinations: Nodes

Slope isabout 3

Adding Destinations: Links

Slope isabout 4

Add Sources or Destinations?

Isolines represent constant node discovery, varying S’s or D’s

Node Degree Distribution

8 Sources

1 Source

Node Degree Distribution: Tail

1 Source

8 Sources

Degree distribution convergence: RMSE

Information Theory Plug

Link Discovery

Can compare marginal utility of different processes.

Node Discovery

Related Work

Pansiot & Grad ’98 First multi-traceroute study Similar methodology, incl. interface disambiguation

Chuang & Sirbu ’98Phillips, Shenker & Tangmunarunkit ’99 single-source case, found sublinear growth of multicast

tree with added destinations Govindan & Tangmunarunkit ’00

Extensive node discovery, overcoming limitations of traceroute

Broido & Claffy ’01 Larger datasets; more detailed look at graph structure

Conclusions

Rigorous quantification of marginal utility of additional measurements.

To discover all physical nodes, traceroute is inefficient Diminishing returns: many S’s and D’s needed

Trading off S’s and D’s Adding destinations seems more cost-effective

To discover how “typical” routes pass through network, traceroute is informative Routing core and feeders Much of routing core is visible from few S’s

(given enough D’s)

Recommended