27
Distance-Constraint Reachability Computation in Uncertain Graphs Ruoming Jin, Lin Liu Kent State University Bolin Ding UIUC Haixun Wang MSRA

Distance-Constraint Reachability Computation in Uncertain Graphs Ruoming Jin, Lin Liu Kent State University Bolin Ding UIUC Haixun Wang MSRA

Embed Size (px)

Citation preview

Page 1: Distance-Constraint Reachability Computation in Uncertain Graphs Ruoming Jin, Lin Liu Kent State University Bolin Ding UIUC Haixun Wang MSRA

Distance-Constraint Reachability Computation in

Uncertain Graphs

Ruoming Jin, Lin Liu Kent State University

Bolin Ding UIUC

Haixun WangMSRA

Page 2: Distance-Constraint Reachability Computation in Uncertain Graphs Ruoming Jin, Lin Liu Kent State University Bolin Ding UIUC Haixun Wang MSRA

Why Uncertain Graphs?

Protein-Protein Interaction NetworksFalse Positive > 45%

Social NetworksProbabilistic Trust/Influence Model

Uncertainty is ubiquitous!

Increasing importance of graph/network data Social Network, Biological Network,

Traffic/Transportation Network, Peer-to-Peer Network

Probabilistic perspective gets more and more attention recently.

Page 3: Distance-Constraint Reachability Computation in Uncertain Graphs Ruoming Jin, Lin Liu Kent State University Bolin Ding UIUC Haixun Wang MSRA

Uncertain Graph Modela

0 .5

0 .7

0 .2 0 .6

0 .5

0 .90 .4

0 .1

0 .3

s

c

b t

Existence Probability

Edge Independence

• Possible worlds (2#Edge)a

s

c

b t

Weight of G2: Pr(G2) =

0.5 (1-0.5)

(1-0.3)

0.2 0.60.7

(1-0.1)(1-0.4) (1-0.9)

* * * ** * * *

G2:

= 0.0007938

G1:

a

s

c

b t

Page 4: Distance-Constraint Reachability Computation in Uncertain Graphs Ruoming Jin, Lin Liu Kent State University Bolin Ding UIUC Haixun Wang MSRA

Distance-Constraint Reachability (DCR) Problem

a0 .5

0 .7

0 .2 0 .6

0 .5

0 .90 .4

0 .1

0 .3

s

c

b t

SourceTarget

• What is the probability that s can reach t within distance d?

• A generalization of the two-terminal network reliability problem, which has no distance constraint.

Given distance constraint d and two vertices s and t,

Page 5: Distance-Constraint Reachability Computation in Uncertain Graphs Ruoming Jin, Lin Liu Kent State University Bolin Ding UIUC Haixun Wang MSRA

Important Applications

• Peer-to-Peer (P2P) Networks– Communication happens only when node

distance is limited.

• Social Networks– Trust/Influence can only be propagated only

through small number of hops.

• Traffic Networks– Travel distance (travel time) query– What is the probability that we can reach the airport

within one hour?

Page 6: Distance-Constraint Reachability Computation in Uncertain Graphs Ruoming Jin, Lin Liu Kent State University Bolin Ding UIUC Haixun Wang MSRA

Example: Exact Computation

• d = 2, ?

a0 .5

0 .7

0 .2 0 .6

0 .5

0 .90 .4

0 .1

0 .3

s

c

b t

a

s

c

b t

a

s

c

b t

a

s

c

b t

a

s

c

b t

First Step: Enumerate all possible worlds (29),

Pr(G1) Pr(G2) Pr(G3) Pr(G4)

… +Pr(G1)* 0+Pr(G2) Pr(G3) Pr(G4)* 1 * 0 * 1+ + + …

Second Step: Check for distance-constraint connectivity,

=

Page 7: Distance-Constraint Reachability Computation in Uncertain Graphs Ruoming Jin, Lin Liu Kent State University Bolin Ding UIUC Haixun Wang MSRA

Approximating Distance-Constraint Reachability

Computation• Hardness

– Two-terminal network reliability is #P-Complete.

– DCR is a generalization.• Our goal is to approximate through

Sampling– Unbiased estimator – Minimal variance– Low computational cost

Page 8: Distance-Constraint Reachability Computation in Uncertain Graphs Ruoming Jin, Lin Liu Kent State University Bolin Ding UIUC Haixun Wang MSRA

Start from the most intuitive estimators,

right?

Page 9: Distance-Constraint Reachability Computation in Uncertain Graphs Ruoming Jin, Lin Liu Kent State University Bolin Ding UIUC Haixun Wang MSRA

Direct Sampling Approach• Sampling Process

– Sample n graphs – Sample each graph according to edge

probabilitya

0 .5

0 .7

0 .2 0 .6

0 .5

0 .90 .4

0 .1

0 .3

s

c

b t

a

s

c

b t

Page 10: Distance-Constraint Reachability Computation in Uncertain Graphs Ruoming Jin, Lin Liu Kent State University Bolin Ding UIUC Haixun Wang MSRA

Direct Sampling Approach (Cont’)

• Estimator

• Unbiased

• Variance

= 1, s reach t within d; = 0, otherwise.

)ˆ( BREIndicator function

Page 11: Distance-Constraint Reachability Computation in Uncertain Graphs Ruoming Jin, Lin Liu Kent State University Bolin Ding UIUC Haixun Wang MSRA

Path-Based Approach• Generate Path Set

– Enumerate all paths from s to t with length ≤ d

– Enumeration methods• E.g., DFS

a

0 .7

0 .2 0 .6

0 .90 .4

0 .1

0 .3

s

c

b t

Page 12: Distance-Constraint Reachability Computation in Uncertain Graphs Ruoming Jin, Lin Liu Kent State University Bolin Ding UIUC Haixun Wang MSRA

Path-Based Approach (Cont’)• Path set

• • Exactly computed by Inclusion-

Exclusion principle• Approximated by Monte-Carlo

Algorithm by R. M. Karp and M. G. Luby ( )

• Unbiased • Variance

Page 13: Distance-Constraint Reachability Computation in Uncertain Graphs Ruoming Jin, Lin Liu Kent State University Bolin Ding UIUC Haixun Wang MSRA

Can we do better?

Page 14: Distance-Constraint Reachability Computation in Uncertain Graphs Ruoming Jin, Lin Liu Kent State University Bolin Ding UIUC Haixun Wang MSRA

Divide-and-Conquer Methodology

• Example

a

s

c

b t

a

s

c

b t

a

s

c

b t

a

s

c

b t

a

s

c

b t

a

s

c

b t

a

s

c

b t

+(s,a) -(s,a)

+(a,t) -(a,t) +(s,b) -(s,b)

… …a

s

c

b t

Page 15: Distance-Constraint Reachability Computation in Uncertain Graphs Ruoming Jin, Lin Liu Kent State University Bolin Ding UIUC Haixun Wang MSRA

1. # of leaf nodes is smaller than 2|E| .

2. Each possible world exists only in one leaf node.

3. Reachability is the sum of the weights of blue nodes.

4. Leaf nodes form a nice sample space.

… ... … ...

Divide and Conquer (Cont’)all possible worlds

Graphs having e1

Graphs not Having e1

s can reach t.

s can not reach t.

Summarize:

Page 16: Distance-Constraint Reachability Computation in Uncertain Graphs Ruoming Jin, Lin Liu Kent State University Bolin Ding UIUC Haixun Wang MSRA

… ... … ...

How do we sample?

• Unequal probability sampling– Hansen-Hurwitz (HH) estimator– Horvitz-Thomson (HT) estimator

Sample Unit

Start from herePri: Sample Unit Weight; Sum of possible worlds’ probabilities in the node.

qi: sampling probability, determined by properties of coins along the way.

Page 17: Distance-Constraint Reachability Computation in Uncertain Graphs Ruoming Jin, Lin Liu Kent State University Bolin Ding UIUC Haixun Wang MSRA

Hansen-Hurwitz (HH) Estimator

• Estimator• Unbiased• Variance

To minimize the variance above, we have :Pri = qi

= 1, blue node

= 0, red node

… ... … ...

Pri = p(e1)*p(e2)*(1-p(e3))*…P(e1) 1-P(e1)

P(e2)

P(e3)1-P(e3)

1-P(e2)p(e1) : 1 – p(e1)

p(e2) : 1 – p(e2)

p(e3) : 1 – p(e3)

P(e4)1-P(e4)

Weight Sampling probability

sample size

Pri: the leaf node weight

qi: the sampling probability

Page 18: Distance-Constraint Reachability Computation in Uncertain Graphs Ruoming Jin, Lin Liu Kent State University Bolin Ding UIUC Haixun Wang MSRA

Horvitz-Thomson (HT) Estimator

• Estimator

• Unbiased• Variance

– To minimize vairance, we findPri = qi

– Smaller variance than HH estimator

# of Unique sample units

Page 19: Distance-Constraint Reachability Computation in Uncertain Graphs Ruoming Jin, Lin Liu Kent State University Bolin Ding UIUC Haixun Wang MSRA

Can we further reduce the variance and computational

cost?

Page 20: Distance-Constraint Reachability Computation in Uncertain Graphs Ruoming Jin, Lin Liu Kent State University Bolin Ding UIUC Haixun Wang MSRA

… ... … ...

Recursive Estimator

1.Unbiased

2.Variance:Sample the entire space n timesSample the sub-

space n1 timesSample the sub-space n2 times We can not

minimize the variance without knowing τ1 and τ2. Then what can we do?

n1 + n2 = n

Page 21: Distance-Constraint Reachability Computation in Uncertain Graphs Ruoming Jin, Lin Liu Kent State University Bolin Ding UIUC Haixun Wang MSRA

Sample Allocation

• We guess: What if– n1 = n*p(e)

– n2 = n*(1-p(e))?

• We find: Variance reduced!– HH Estimator:

– HT Estimator:

Page 22: Distance-Constraint Reachability Computation in Uncertain Graphs Ruoming Jin, Lin Liu Kent State University Bolin Ding UIUC Haixun Wang MSRA

Sample Allocation (Cont’)• Sampling Time Reduced!!

Directly allocate samples

Toss coin when sample size is small

Sample size = n

n1=n*p(e1) n2=n*(1-p(e1))

n3=n1*p(e2)

n4=n1*(1-p(e2))

Page 23: Distance-Constraint Reachability Computation in Uncertain Graphs Ruoming Jin, Lin Liu Kent State University Bolin Ding UIUC Haixun Wang MSRA

Experimental Setup• Experiment setting

– Goal:• Relative Error• Variance• Computational Time

– System Specification• 2.0GHz Dual Core AMD Opteron CPU• 4.0GB RAM• Linux

Page 24: Distance-Constraint Reachability Computation in Uncertain Graphs Ruoming Jin, Lin Liu Kent State University Bolin Ding UIUC Haixun Wang MSRA

Experimental Results• Synthetic datasets

– Erdös-Rényi random graphs– Vertex#: 5000, edge density: 10, Sample

size: 1000– Categorized by extracted-subgraph size

(#edge)– For each category, 1000 queries

Page 25: Distance-Constraint Reachability Computation in Uncertain Graphs Ruoming Jin, Lin Liu Kent State University Bolin Ding UIUC Haixun Wang MSRA

Experimental Results

• Real datasets– DBLP: 226,000 vertices, 1,400,000

edges– Yeast PPIN: 5499 vertices, 63796 edges– Fly PPIN: 7518 vertices, 51660 edges– Extracted subgraphs size: 20 ~ 50

edges

Page 26: Distance-Constraint Reachability Computation in Uncertain Graphs Ruoming Jin, Lin Liu Kent State University Bolin Ding UIUC Haixun Wang MSRA

Conclusions

• We first propose a novel s-t distance-constraint reachability problem in uncertain graphs.

• One efficient exact computation algorithm is developed based on a divide-and-conquer scheme.

• Compared with two classic reachability estimators, two significant unequal probability sampling estimators Hansen-Hurwitz (HH) estimator and Horvitz-Thomson (HT) estimator.

• Based on the enumeration tree framework, two recursive estimators Recursive HH, and Recursive HT are constructed to reduce estimation variance and time.

• Experiments demonstrate the accuracy and efficiency of our estimators.

Page 27: Distance-Constraint Reachability Computation in Uncertain Graphs Ruoming Jin, Lin Liu Kent State University Bolin Ding UIUC Haixun Wang MSRA

Thank you !Questions?