Distance-Constraint Reachability Computation in Uncertain Graphs

Distance-Constraint Reachability Computation in

Uncertain Graphs

Ruoming Jin, Lin Liu Kent State University

Bolin Ding UIUC

Haixun WangMSRA

Why Uncertain Graphs?

Protein-Protein Interaction NetworksFalse Positive > 45%

Social NetworksProbabilistic Trust/Influence Model

Uncertainty is ubiquitous!

Increasing importance of graph/network data Social Network, Biological Network,

Traffic/Transportation Network, Peer-to-Peer Network

Probabilistic perspective gets more and more attention recently.

Uncertain Graph Modela

0 .5

0 .7

0 .2 0 .6

0 .5

0 .90 .4

0 .1

0 .3

s

c

b t

Existence Probability

Edge Independence

• Possible worlds (2#Edge)a

s

c

b t

Weight of G2: Pr(G2) =

0.5 (1-0.5)

(1-0.3)

0.2 0.60.7

(1-0.1)(1-0.4) (1-0.9)

* * * ** * * *

G2:

= 0.0007938

G1:

a

s

c

b t

Distance-Constraint Reachability (DCR) Problem

a0 .5

0 .7

0 .2 0 .6

0 .5

0 .90 .4

0 .1

0 .3

s

c

b t

SourceTarget

• What is the probability that s can reach t within distance d?

• A generalization of the two-terminal network reliability problem, which has no distance constraint.

Given distance constraint d and two vertices s and t,

Important Applications

• Peer-to-Peer (P2P) Networks– Communication happens only when node

distance is limited.

• Social Networks– Trust/Influence can only be propagated only

through small number of hops.

• Traffic Networks– Travel distance (travel time) query– What is the probability that we can reach the airport

within one hour?

Example: Exact Computation

• d = 2, ?

a0 .5

0 .7

0 .2 0 .6

0 .5

0 .90 .4

0 .1

0 .3

s

c

b t

a

s

c

b t

a

s

c

b t

a

s

c

b t

a

s

c

b t

First Step: Enumerate all possible worlds (29),

Pr(G1) Pr(G2) Pr(G3) Pr(G4)

… +Pr(G1)* 0+Pr(G2) Pr(G3) Pr(G4)* 1 * 0 * 1+ + + …

Second Step: Check for distance-constraint connectivity,

=

Approximating Distance-Constraint Reachability

Computation• Hardness

– Two-terminal network reliability is #P-Complete.

– DCR is a generalization.• Our goal is to approximate through

Sampling– Unbiased estimator – Minimal variance– Low computational cost

Start from the most intuitive estimators,

right?

Direct Sampling Approach• Sampling Process

– Sample n graphs – Sample each graph according to edge

probabilitya

0 .5

0 .7

0 .2 0 .6

0 .5

0 .90 .4

0 .1

0 .3

s

c

b t

a

s

c

b t

Direct Sampling Approach (Cont’)

• Estimator

• Unbiased

• Variance

= 1, s reach t within d; = 0, otherwise.

)ˆ( BREIndicator function

Path-Based Approach• Generate Path Set

– Enumerate all paths from s to t with length ≤ d

– Enumeration methods• E.g., DFS

a

0 .7

0 .2 0 .6

0 .90 .4

0 .1

0 .3

s

c

b t

Path-Based Approach (Cont’)• Path set

• • Exactly computed by Inclusion-

Exclusion principle• Approximated by Monte-Carlo

Algorithm by R. M. Karp and M. G. Luby ( )

• Unbiased • Variance

Can we do better?

Divide-and-Conquer Methodology

• Example

a

s

c

b t

a

s

c

b t

a

s

c

b t

a

s

c

b t

a

s

c

b t

a

s

c

b t

a

s

c

b t

+(s,a) -(s,a)

+(a,t) -(a,t) +(s,b) -(s,b)

…

… …a

s

c

b t

…

…

…

1. # of leaf nodes is smaller than 2|E| .

2. Each possible world exists only in one leaf node.

3. Reachability is the sum of the weights of blue nodes.

4. Leaf nodes form a nice sample space.

… ... … ...

…

Divide and Conquer (Cont’)all possible worlds

Graphs having e1

Graphs not Having e1

s can reach t.

s can not reach t.

Summarize:

… ... … ...

…

How do we sample?

• Unequal probability sampling– Hansen-Hurwitz (HH) estimator– Horvitz-Thomson (HT) estimator

Sample Unit

Start from herePri: Sample Unit Weight; Sum of possible worlds’ probabilities in the node.

qi: sampling probability, determined by properties of coins along the way.

Hansen-Hurwitz (HH) Estimator

• Estimator• Unbiased• Variance

To minimize the variance above, we have :Pri = qi

= 1, blue node

= 0, red node

… ... … ...

…

Pri = p(e1)*p(e2)*(1-p(e3))*…P(e1) 1-P(e1)

P(e2)

P(e3)1-P(e3)

1-P(e2)p(e1) : 1 – p(e1)

p(e2) : 1 – p(e2)

p(e3) : 1 – p(e3)

P(e4)1-P(e4)

Weight Sampling probability

sample size

Pri: the leaf node weight

qi: the sampling probability

Horvitz-Thomson (HT) Estimator

• Estimator

• Unbiased• Variance

– To minimize vairance, we findPri = qi

– Smaller variance than HH estimator

# of Unique sample units

Can we further reduce the variance and computational

cost?

… ... … ...

…

Recursive Estimator

1.Unbiased

2.Variance:Sample the entire space n timesSample the sub-

space n1 timesSample the sub-space n2 times We can not

minimize the variance without knowing τ1 and τ2. Then what can we do?

n1 + n2 = n

Sample Allocation

• We guess: What if– n1 = n*p(e)

– n2 = n*(1-p(e))?

• We find: Variance reduced!– HH Estimator:

– HT Estimator:

Sample Allocation (Cont’)• Sampling Time Reduced!!

Directly allocate samples

Toss coin when sample size is small

Sample size = n

n1=n*p(e1) n2=n*(1-p(e1))

n3=n1*p(e2)

n4=n1*(1-p(e2))

Experimental Setup• Experiment setting

– Goal:• Relative Error• Variance• Computational Time

– System Specification• 2.0GHz Dual Core AMD Opteron CPU• 4.0GB RAM• Linux

Experimental Results• Synthetic datasets

– Erdös-Rényi random graphs– Vertex#: 5000, edge density: 10, Sample

size: 1000– Categorized by extracted-subgraph size

(#edge)– For each category, 1000 queries

Experimental Results

• Real datasets– DBLP: 226,000 vertices, 1,400,000

edges– Yeast PPIN: 5499 vertices, 63796 edges– Fly PPIN: 7518 vertices, 51660 edges– Extracted subgraphs size: 20 ~ 50

edges

Conclusions

• We first propose a novel s-t distance-constraint reachability problem in uncertain graphs.

• One efficient exact computation algorithm is developed based on a divide-and-conquer scheme.

• Compared with two classic reachability estimators, two significant unequal probability sampling estimators Hansen-Hurwitz (HH) estimator and Horvitz-Thomson (HT) estimator.

• Based on the enumeration tree framework, two recursive estimators Recursive HH, and Recursive HT are constructed to reduce estimation variance and time.

• Experiments demonstrate the accuracy and efficiency of our estimators.

Thank you !Questions?

Documents

Distance-Constraint Reachability Computation in Uncertain Graphs