AT&T Labs - Research Internet Measurement Conference 2003 27-29 Of October, 2003 Miami, Florida, USA Date for student

AT&T Labs - Research

Internet Measurement Conference 2003

27-29 Of October, 2003

Miami, Florida, USA

http://www.icir.org/vern/imc-2003/

Date for student travel grant applications: Sept 5th


An Information-Theoretic Approach to Traffic Matrix

Estimation

Yin Zhang, Matthew Roughan, Carsten Lund – AT&T ResearchDavid Donoho – Stanford

Shannon LabShannon Lab


Want to know demands from source to destination

Problem

Have link traffic measurements

A

B

C

...

...

...,, CABA xx

TM


Example App: reliability analysis

Under a link failure, routes changewant to find an traffic invariant

A

B

C


Approach

Principle *“Don’t try to estimate something

if you don’t have any information about it”

Maximum Entropy Entropy is a measure of uncertainty

More information = less entropy To include measurements, maximize entropy subject to

the constraints imposed by the data Impose the fewest assumptions on the results

Instantiation: Maximize “relative entropy” Minimum Mutual Information


Mathematical Formalism

Only measure traffic at links

1

3

2router

link 1

link 2

link 3

Traffic y1



1

3

2router

route 2

route 1

route 3

Problem: Estimate traffic matrix (x’s) from the link measurements (y’s)

Traffic y1

Traffic matrix element x1



1

3

2router

route 2

route 1

route 3

311 xxy




1

3

2router

route 2

route 1

route 3

311 xxy


3

2

1

3

2

1

110

011

101

x

x

x

y

y

y



1

3

2router

route 2

route 1

route 3

311 xxy

For non-trivial networkUNDERCONSTRAINED

y = Ax

Routing matrix


Regularization

Want a solution that satisfies constraints: y = Ax Many more unknowns than measurement: O(N2) vs O(N) Underconstrained system Many solutions satisfy the equations Must somehow choose the “best” solution

Such (ill-posed linear inverse) problems occur in Medical imaging: e.g CAT scans Seismology Astronomy

Statistical intuition => Regularization Penalty function J(x) solution:

xJAxyx

22minarg


How does this relate to other methods?

Previous methods are just particular cases of J(x)

Tomogravity (Zhang, Roughan, Greenberg and Duffield) J(x) is a weighted quadratic distance from a gravity model

A very natural alternative Start from a penalty function that satisfies the

“maximum entropy” principle Minimum Mutual Information


Minimum Mutual Information (MMI)

Mutual Information I(S,D) Information gained about Source from Destination I(S,D) = -relative entropy with respect to independent S

and DI(S,D) = 0S and D are independentp(D|S) = p(D)gravity model

Natural application of principle * Assume independence in the absence of other information Aggregates have similar behavior to network overall

When we get additional information (e.g. y = Ax) Maximize entropy Minimize I(S,D) (subject to

constraints) J(x) = I(S,D)

equivalent


MMI in practice

In general there aren’t enough constraints Constraints give a subspace of possible solutions

y = Ax


MMI in practice

Independence gives us a starting point

y = Ax

independent solution


MMI in practice

Find a solution which Satisfies the constraint Is closest to the independent solution

solution

Distance measure is the Kullback-Lieber divergence


Is that it?

Not quite that simple Need to do some networking specific things e.g. conditional independence to model hot-potato

routing

Can be solved using standard optimization toolkits Taking advantage of sparseness of routing matrix A

Back to tomogravity Conditional independence = generalized gravity model Quadratic distance function is a first order

approximation to the Kullback-Leibler divergence Tomogravity is a first-order approximation to MMI


Results – Single example

±20% bounds for larger flows Average error ~11% Fast (< 5 seconds) Scales:

O(100) nodes


More results

tomogravitymethod

simpleapproximation

>80% of demands have <20% error

Large errors are in small flows


Other experiments

Sensitivity Very insensitive to lambda Simple approximations work well

Robustness Missing data Erroneous link data Erroneous routing data

Dependence on network topology Via Rocketfuel network topologies

Additional information Netflow Local traffic matrices


Dependence on Topology

0

5

10

15

20

25

30

0 1 2 3 4 5 6 7 8 9 10 11unknowns per measurement

rela

tive

err

ors

(%)

randomgeographicLinear (geographic)

clique

star (20 nodes)


Additional information – Netflow


Local traffic matrix (George Varghese)

for referenceprevious case

0%1%5%10%


Conclusion

We have a good estimation method Robust, fast, and scales to required size Accuracy depends on ratio of unknowns to

measurements Derived from principle

Approach gives some insight into other methods Why they work – regularization Should provide better idea of the way forward

Additional insights about the network and traffic Traffic and network are connected

Implemented Used in AT&T’s NA backbone Accurate enough in practice


Results

Methodology Use netflow based partial (~80%) traffic matrix Simulate SNMP measurements using routing sim, and

y = Ax Compare estimates, and true traffic matrix

Advantage Realistic network, routing, and traffic Comparison is direct, we know errors are due to

algorithm not errors in the data Can do controlled experiments (e.g. introduce known

errors)

Data One hour traffic matrices (don’t need fine grained data) 506 data sets, comprising the majority of June 2002 Includes all times of day, and days of week


Robustness (input errors)


Robustness (missing data)


Point-to-multipoint

We don’t see whole Internet – What if an edge link fails?Point-to-point traffic matrix isn’t invariant


Point-to-multipoint

Included in this approach Implicit in results above Explicit results worse

Ambiguity in demands in increased

More demands use exactly the same sets of routes

use in applications is better

Point-to-point Point-to-multipoint

Link failure analysis


Independent model


Conditional independence

Internet routing is asymmetric A provider can control exit points for traffic going

to peer networks

peer links

access links



peer links

access links

Internet routing is asymmetric A provider can control exit points for traffic going

to peer networks Have much less control of where traffic enters




Minimum Mutual Information (MMI)

Mutual Information I(S,D)=0 Information gained about S from D

I(S,D) = relative entropy with respect to independence Can also be given by Kullback-Leibler information

divergence

Why this model In the absence of information, let’s assume no

information Minimal assumption about the traffic Large aggregates tend to behave like overall network?

ds dDpsSp

dDsSpdDsSpDSI

, )()(

),(log),(),(


Dependence on Topology

Unknowns per Relative

Errors (%)

NetworkPoPs Links

measurement

Geographic Random

Exodus* 17 58 4.69 12.6 20.0

Sprint* 19 100 3.42 8.0 18.9

Abovenet*

11 48 2.29 3.8 11.7

Star N 2(N-1) N/2=10 24.0 24.0

Clique N N(N-1) 1 0.2 0.2

AT&T - - 3.54-3.97 10.6* These are not the actual networks, but only estimates made by Rocketfuel


Bayesian (e.g. Tebaldi and West) J(x) = -log(x), where (x) is the prior model

MLE (e.g. Vardi, Cao et al, …) In their thinking the prior model generates extra

constraints Equally, can be modeled as a (complicated) penalty

function• Uses deviations from higher order moments predicted by

model


Acknowledgements

Local traffic matrix measurements George Varghese

PDSCO optimization toolkit for Matlab Michael Saunders

Data collection Fred True, Joel Gottlieb

Tomogravity Albert Greenberg and Nick Duffield

Documents

AT&T Labs - Research Internet Measurement Conference 2003 27-29 Of October, 2003 Miami, Florida, USA Date for student