Matrix Factorizations for Computer Network Tomographybindel/present/2011-07-householder.pdfMatrix...

Preview:

Citation preview

Matrix Factorizations forComputer Network Tomography

David Bindel

Department of Computer ScienceCornell University

17 Jun 2011

D. Bindel (Cornell University) Householder 2011 1 / 32

How this started

“I typed in the SVD from Numerical Recipes; it ran for a few days, thentold me it wouldn’t converge. What can I do?”

D. Bindel (Cornell University) Householder 2011 2 / 32

A fuzzy picture

WAN

D. Bindel (Cornell University) Householder 2011 3 / 32

A problem path

WAN

D. Bindel (Cornell University) Householder 2011 4 / 32

Overlays to the rescue?

WAN

D. Bindel (Cornell University) Householder 2011 5 / 32

Overlays and measurement

WAN

Measure a few paths to infer:Path properties (Chen, B., Song, Chavez, Katz: 2003, 2004, 2007)

Link properties (Zhao, Chen, B.: 2006, 2009)

Routing topology? (underway)

D. Bindel (Cornell University) Householder 2011 6 / 32

Discrete Radon transform

Radon transform:

(Ru)(L) =

∫L

u(x) |dx|

WAN

Discrete version:

(Gu)i =∑

j∈links(i)uj

D. Bindel (Cornell University) Householder 2011 7 / 32

Additive metrics and path matrices

For additive metrics (log(P(transmission)), latency, jitter, ...):

Gu = b

wherebi = property of i th end-to-end pathuj = property of link j

Gij =

{1 if path i uses link j0 otherwise

Today’s goal: A structured rank-revealing decomposition of G

D. Bindel (Cornell University) Householder 2011 8 / 32

Properties of G

Network Matrix GRouting path Row of GShort paths Sparse GRouting table updates Low rank updates to G

k = rank(G) < # links� # paths (for n sufficiently large).Ex: 500 nodes (of 100K), G is 249500× 33237, k = 9643.

D. Bindel (Cornell University) Householder 2011 9 / 32

Identifiability

If both paths are flaky, what link is to blame?

D. Bindel (Cornell University) Householder 2011 10 / 32

Network virtualization and matrix factorization

Factor out a zero-one “virtualization matrix”:

G(:, fan) =[c1 c2 c1 + c2 c1 + c2

]=[c1 c2

] [1 0 1 10 1 1 1

]Even virtual links may be “unidentifiable.”

D. Bindel (Cornell University) Householder 2011 11 / 32

Network virtualization and matrix factorization

Write network virtualization as

G = Gv S

where

Gij =

{1 if path i uses link j0 otherwise

Gvik =

{1 if path i uses virtual link k0 otherwise

Skj =

{1 if virtual link k includes link i0 otherwise

This handles (some) column dependencies. What about rows?

D. Bindel (Cornell University) Householder 2011 12 / 32

The junction pattern

B2

A1

A2

B1

A1B1 − A2B1 + A2B2 = A3B3

D. Bindel (Cornell University) Householder 2011 13 / 32

The junction pattern

B2

A1

A2

B1

A1B1−A2B1 + A2B2 = A3B3

D. Bindel (Cornell University) Householder 2011 14 / 32

The junction pattern

B2

A1

A2

B1

A1B1 − A2B1+A2B2 = A3B3

D. Bindel (Cornell University) Householder 2011 15 / 32

The junction pattern

B2

A1

A2

B1

A1B1 − A2B1+A2B2 = A3B3

D. Bindel (Cornell University) Householder 2011 16 / 32

The junction pattern

B2

A1

A2

B1

A1B1 − A2B1 + A2B2= A3B3

D. Bindel (Cornell University) Householder 2011 17 / 32

The junction pattern

B2

A1

A2

B1

A1B1 − A2B1 + A2B2 = A1B2

D. Bindel (Cornell University) Householder 2011 18 / 32

More complicated junctions?

D. Bindel (Cornell University) Householder 2011 19 / 32

Junctions and bipartite graphs

Define a bipartite router graph at r :Path segments from sources to r are nodes on the leftPath segments from r to destinations are nodes in the rightEdges indicate complete paths traversing r

D. Bindel (Cornell University) Householder 2011 20 / 32

Junctions and bipartite graphs

Spanning trees in the router graph=⇒

spanning sets among path vectors

D. Bindel (Cornell University) Householder 2011 21 / 32

Junctions and bipartite graphs example

D. Bindel (Cornell University) Householder 2011 22 / 32

Junction elimination and factorization

Gr = matrix of path vectors for paths crossing r :

Gr =[ES ED

] [PSPD

]=

[I

Tr

] [ES ED

] [PSPD

]=

[I

Tr

] [Gr]

whereRows of P are path segments to/from the routerRows of E indicate how path segments sum to form pathsE corresponds to spanning tree edgesG is the corresponding subset of pathsRows of Tr consist of ±1 entries (and zeros) corresponding topaths through the spanning tree

D. Bindel (Cornell University) Householder 2011 23 / 32

Combining router results

B

A A

B

Process each path in turn, build router forests incrementally.Processing a path from A to B through r , have either

1 A− r and r − B in same component =⇒could infer path at r from existing paths

2 A− r and r − B in different components =⇒might make new inferences via this path

D. Bindel (Cornell University) Householder 2011 24 / 32

Elimination algorithm

To process path from A to B:

for each router r on pathupdate hash h of route up to rif (A,h)-(r,B) in router graphmark that path can be inferred at r

elseadd (A,h)-(r,B) to router graphfor each edge e from source in [(A,h)]if e goes to component for (A,h) and

edge is not already marked thenput edge on top of list to be processed

if path was not inferred, mark as measured

D. Bindel (Cornell University) Householder 2011 25 / 32

Matrix factorization perspective

Junction elimination yields

PG =

[IT

]G,

where the first factor is a product of matrices of the form[I

Tk

]with rows of Tk representing paths through router graphs.

D. Bindel (Cornell University) Householder 2011 26 / 32

Matrix factorization perspective

Can combine with topology virtualization:

PG =

[IT

]Gv S

where S is a zero-one virtualization matrix.

D. Bindel (Cornell University) Householder 2011 27 / 32

Storage

Sufficient to store:Each router graph (≈ nnz(G) edges)Union-find structures for tracking componentsMarkers for which paths are measuredRouter used for inference for each inferred path

D. Bindel (Cornell University) Householder 2011 28 / 32

Choice of representative paths

Spanning trees are not unique! Want representative paths such thatThere are few redundant measurementsNo host (or router) is overloaded with measurement trafficMost inferences involve few paths

Don’t know how to do this yet...

D. Bindel (Cornell University) Householder 2011 29 / 32

Summary

Path matrix G useful for network inference:Measure a few paths, infer restMeasure a few paths, estimate link behaviors

Cost of factoring G limited scalability.

New factorization for the path matrix G isCompact (proportional to G in storage)Easy to computeNearly rank-revealingFaithful to the underlying problem structure

D. Bindel (Cornell University) Householder 2011 30 / 32

Questions?

I have lots more questions:How should we process paths for load balancing, etc?Can these methods be distributed?Are there other places where this factorization applies?

Questions from you?

D. Bindel (Cornell University) Householder 2011 31 / 32

References

Chen, B., Song, Chavez, Katz. Algebra-based scalable overlaynetwork monitoring: Algorithms, evaluation, and applications.ACM ToN, 17(6), 2009.Zhao, Chen, B. Towards unbiased end-to-end network diagnosis.ACM ToN, 15(5), 2007

D. Bindel (Cornell University) Householder 2011 32 / 32

Recommended