Upload
mario-hostler
View
223
Download
7
Tags:
Embed Size (px)
Citation preview
Center-Piece Subgraphs: Problem definition and
Fast Solutions
Hanghang Tong
Christos FaloutsosCarnegie Mellon University
2
Center-Piece Subgraph(Ceps)
• Given Q query nodes• Find Center-piece ( )
• Input of Ceps– Q Query nodes– Budget b– K softand coefficient
• App.– Social Network– Law Inforcement– Gene Network– …
A C
B
A C
B
A C
B
b
3
Challenges in Ceps
• Q1: How to measure the importance?
• Q2: How to extract connection subgraph?
• Q3: How to do it efficiently?
4
Roadmap
• Ceps Overview
• Q1: Goodness Score Calculation
• Q2: Extract Alg.
• Q3: Efficiency Issue
• Experimental Results
• Conclusion
5
Ceps Overview
• Individual Score Calculation– Measure importance wrt individual query
• Combine Individual Scores– Measure importance wrt query set
• “Extract” Alg.– … the connection subgraphs
( , ) n Qr i j
1( , ) nr Q j
arg max ( )H g H
A C
B
6
Roadmap
• Ceps Overview
• Q1: Goodness Score Calculation
• Q2: “Extract” Alg.
• Q3: Efficiency Issue
• Experimental Results
• Conclusion
7
RWR: Individual Score Calculation
• Goal– Individual importance score r(i,j) = ri,j
– For each node j wrt each query i
• How to– Random walk with restart– Steady State Prob.
(1 )r c Pr ce
8
An Illustrating Example
1
2
3
4
5
6
789
11
10 13
12•Starting from 1
•Randomly to neighbor
•Some p to return to 1
Prob (RW will finally stay at j)
9
Individual Score Calculation
Q1 Q2 Q3
Node 1Node 2Node 3Node 4Node 5Node 6Node 7Node 8Node 9Node 10Node 11Node 12Node 13
0.5767 0.0088 0.0088 0.1235 0.0076 0.0076 0.0283 0.0283 0.0283 0.0076 0.1235 0.0076 0.0088 0.5767 0.0088 0.0076 0.0076 0.1235 0.0088 0.0088 0.5767 0.0333 0.0024 0.1260 0.1260 0.0024 0.0333 0.1260 0.0333 0.0024 0.0333 0.1260 0.0024 0.0024 0.1260 0.0333 0.0024 0.0333 0.1260
1
10
11
9 8
12
13
4
3
62
0.5767
0.1260
0.1235
0.1260
0.0283
0.0333
0.0024
0.0088
0.0076
0.00760.00240.0333
0.0088
7
5
10
Individual Score Calculation
Q1 Q2 Q3
Node 1Node 2Node 3Node 4Node 5Node 6Node 7Node 8Node 9Node 10Node 11Node 12Node 13
0.5767 0.0088 0.0088 0.1235 0.0076 0.0076 0.0283 0.0283 0.0283 0.0076 0.1235 0.0076 0.0088 0.5767 0.0088 0.0076 0.0076 0.1235 0.0088 0.0088 0.5767 0.0333 0.0024 0.1260 0.1260 0.0024 0.0333 0.1260 0.0333 0.0024 0.0333 0.1260 0.0024 0.0024 0.1260 0.0333 0.0024 0.0333 0.1260
Individual Score matrix
1
10
11
9 8
12
13
4
3
62
0.5767
0.1260
0.1235
0.1260
0.0283
0.0333
0.0024
0.0088
0.0076
0.00760.00240.0333
0.0088
7
5
11
AND: Combine Scores
• Q: How to combine scores?
• A: Multiply• …= prob. 3 random
particles coincide on node j
12
K_SoftAnd: Combine Scores
Generalization – SoftAND:
We want nodes close to k of Q (k<Q) query nodes.
Q: How to do that?
13
K_SoftAnd: Combine Scores
Generalization – softAND:
We want nodes close to k of Q (k<Q) query nodes.
Q: How to do that?
A: Prob(at least k-out-of-Q will meet each other at j)
14
K_SoftAnd: Relaxation of AND
Asking AND query? No Answer!
Disconnected Communities
Noise
16
AND query vs. K_SoftAnd query
And Query 2_SoftAnd Query
x 1e-4
1 7
5
10
11
9 8
12
13
4
3
62
0.4505
0.1010
0.0710
0.1010
0.2267
0.1010
0.1010
0.4505
0.0710
0.07100.10100.1010
0.4505
1 7
5
10
11
9 8
12
13
4
3
62
0.0103
0.0046
0.0019
0.0046
0.0024
0.0046
0.0046
0.0103
0.0019
0.00190.00460.0046
0.0103
17
1 7
5
10
11
9 8
12
13
4
3
62
0.0103
0.1617
0.1387
0.1617
0.0849
0.1617
0.1617
0.0103
0.1387
0.13870.16170.1617
0.0103
1_SoftAnd query = OR query
18
Measuring Importance
Q1 Q2 Q3
Node 1Node 2Node 3Node 4Node 5Node 6Node 7Node 8Node 9Node 10Node 11Node 12Node 13
0.5767 0.0088 0.0088 0.1235 0.0076 0.0076 0.0283 0.0283 0.0283 0.0076 0.1235 0.0076 0.0088 0.5767 0.0088 0.0076 0.0076 0.1235 0.0088 0.0088 0.5767 0.0333 0.0024 0.1260 0.1260 0.0024 0.0333 0.1260 0.0333 0.0024 0.0333 0.1260 0.0024 0.0024 0.1260 0.0333 0.0024 0.0333 0.1260
0.45050.07100.22670.07100.45050.07100.45050.10100.10100.10100.10100.10100.1010
OR
0.01030.00190.01030.00190.01030.00190.00240.00460.00460.00460.00460.00460.0046
K_SoftAnd
Random
walk w
ith restart
And 2_SoftAnd
Individual Scores Combining Scores
Steady State Prob
Meeting Prob
19
Roadmap
• Ceps Overview
• Q1: Goodness Score Calculation
• Q2: “Extract” Alg.
• Q3: Efficiency
• Experimental Results
• Conclusion
20
• Goal– Maximize total scores and– ‘Appropriate’ Connections
• How to…”Extract” Alg.– Dynamic Programming– Greedy Alg.
• Pickup promising node• Find ‘best’ path
“Extract” Alg.
1
2
3
54
6
7
8
910
11
12
13
14 15 16
1
2
3
54
6
7
8
910
11
12
13
21
Roadmap
• Ceps Overview
• Q1: Goodness Score Calculation
• Q2: “Extract” Alg.
• Q3: Efficiency
• Experimental Results
• Conclusion
22
Graph Partition: Efficiency Issue
• Straightforward way– Q linear system: – linear to # of edge
• Observation– Skewed dist.
• How to…– Graph partition
23
Roadmap
• Ceps Overview
• Q1: Goodness Score Calculation
• Q2: “Extract” Alg.
• Q3: Efficiency Issue
• Experimental Results
• Conclusion
24
Experimental Setup
• Dataset– DBLP/authorship– Author-Paper– 315k nodes– 1,800k edges
• Evaluation Criteria– I Node Ratio
– I Edge Ratio
25
Experimental Setup
• We want to check– Does the goodness criteria make sense?– Does “extract” alg. capture most of important
nodes/edge?– Efficiency
26
Case Study: AND query
R. Agrawal Jiawei Han
V. Vapnik M. Jordan
H.V. Jagadish
Laks V.S. Lakshmanan
Heikki Mannila
Christos Faloutsos
Padhraic Smyth
Corinna Cortes
15 1013
1 1
6
1 1
4 Daryl Pregibon
10
2
11
3
16
27
R. Agrawal Jiawei Han
V. Vapnik M. Jordan
H.V. Jagadish
Laks V.S. Lakshmanan
Umeshwar Dayal
Bernhard Scholkopf
Peter L. Bartlett
Alex J. Smola
1510
13
3 3
5 2 2
327
4
2_SoftAnd query
Statistic
database
28
Evaluation of “Extract” Alg.
• 20 nodes
• 90%+ preserved
Budget (b)
Node Ratio 2 query nodes
3 query nodes
29
Running Time vs. Quality for Fast Ceps
Running Time
Quality
~90% quality
6:1 speedup
30
Roadmap
• Ceps Overview
• Q1: Goodness Score Calculation
• Q2: “Extract” Alg.
• Q3: Efficiency Issue
• Experimental Results
• Conclusion
31
Conclusion
• Q1:How to measure the importance?• A1: RWR+K_SoftAnd• Q2: How to find connection subgraph?• A2:”Extract” Alg.• Q3:How to do it efficiently?• A3:Graph Partition (Fast Ceps)
– ~90% quality– 6:1 speedup