31
Center-Piece Subgraphs: Problem definition and Fast Solutions Hanghang Tong Christos Faloutsos Carnegie Mellon University

Center-Piece Subgraphs: Problem definition and Fast Solutions Hanghang Tong Christos Faloutsos Carnegie Mellon University

Embed Size (px)

Citation preview

Page 1: Center-Piece Subgraphs: Problem definition and Fast Solutions Hanghang Tong Christos Faloutsos Carnegie Mellon University

Center-Piece Subgraphs: Problem definition and

Fast Solutions

Hanghang Tong

Christos FaloutsosCarnegie Mellon University

Page 2: Center-Piece Subgraphs: Problem definition and Fast Solutions Hanghang Tong Christos Faloutsos Carnegie Mellon University

2

Center-Piece Subgraph(Ceps)

• Given Q query nodes• Find Center-piece ( )

• Input of Ceps– Q Query nodes– Budget b– K softand coefficient

• App.– Social Network– Law Inforcement– Gene Network– …

A C

B

A C

B

A C

B

b

Page 3: Center-Piece Subgraphs: Problem definition and Fast Solutions Hanghang Tong Christos Faloutsos Carnegie Mellon University

3

Challenges in Ceps

• Q1: How to measure the importance?

• Q2: How to extract connection subgraph?

• Q3: How to do it efficiently?

Page 4: Center-Piece Subgraphs: Problem definition and Fast Solutions Hanghang Tong Christos Faloutsos Carnegie Mellon University

4

Roadmap

• Ceps Overview

• Q1: Goodness Score Calculation

• Q2: Extract Alg.

• Q3: Efficiency Issue

• Experimental Results

• Conclusion

Page 5: Center-Piece Subgraphs: Problem definition and Fast Solutions Hanghang Tong Christos Faloutsos Carnegie Mellon University

5

Ceps Overview

• Individual Score Calculation– Measure importance wrt individual query

• Combine Individual Scores– Measure importance wrt query set

• “Extract” Alg.– … the connection subgraphs

( , ) n Qr i j

1( , ) nr Q j

arg max ( )H g H

A C

B

Page 6: Center-Piece Subgraphs: Problem definition and Fast Solutions Hanghang Tong Christos Faloutsos Carnegie Mellon University

6

Roadmap

• Ceps Overview

• Q1: Goodness Score Calculation

• Q2: “Extract” Alg.

• Q3: Efficiency Issue

• Experimental Results

• Conclusion

Page 7: Center-Piece Subgraphs: Problem definition and Fast Solutions Hanghang Tong Christos Faloutsos Carnegie Mellon University

7

RWR: Individual Score Calculation

• Goal– Individual importance score r(i,j) = ri,j

– For each node j wrt each query i

• How to– Random walk with restart– Steady State Prob.

(1 )r c Pr ce

Page 8: Center-Piece Subgraphs: Problem definition and Fast Solutions Hanghang Tong Christos Faloutsos Carnegie Mellon University

8

An Illustrating Example

1

2

3

4

5

6

789

11

10 13

12•Starting from 1

•Randomly to neighbor

•Some p to return to 1

Prob (RW will finally stay at j)

Page 9: Center-Piece Subgraphs: Problem definition and Fast Solutions Hanghang Tong Christos Faloutsos Carnegie Mellon University

9

Individual Score Calculation

Q1 Q2 Q3

Node 1Node 2Node 3Node 4Node 5Node 6Node 7Node 8Node 9Node 10Node 11Node 12Node 13

0.5767 0.0088 0.0088 0.1235 0.0076 0.0076 0.0283 0.0283 0.0283 0.0076 0.1235 0.0076 0.0088 0.5767 0.0088 0.0076 0.0076 0.1235 0.0088 0.0088 0.5767 0.0333 0.0024 0.1260 0.1260 0.0024 0.0333 0.1260 0.0333 0.0024 0.0333 0.1260 0.0024 0.0024 0.1260 0.0333 0.0024 0.0333 0.1260

1

10

11

9 8

12

13

4

3

62

0.5767

0.1260

0.1235

0.1260

0.0283

0.0333

0.0024

0.0088

0.0076

0.00760.00240.0333

0.0088

7

5

Page 10: Center-Piece Subgraphs: Problem definition and Fast Solutions Hanghang Tong Christos Faloutsos Carnegie Mellon University

10

Individual Score Calculation

Q1 Q2 Q3

Node 1Node 2Node 3Node 4Node 5Node 6Node 7Node 8Node 9Node 10Node 11Node 12Node 13

0.5767 0.0088 0.0088 0.1235 0.0076 0.0076 0.0283 0.0283 0.0283 0.0076 0.1235 0.0076 0.0088 0.5767 0.0088 0.0076 0.0076 0.1235 0.0088 0.0088 0.5767 0.0333 0.0024 0.1260 0.1260 0.0024 0.0333 0.1260 0.0333 0.0024 0.0333 0.1260 0.0024 0.0024 0.1260 0.0333 0.0024 0.0333 0.1260

Individual Score matrix

1

10

11

9 8

12

13

4

3

62

0.5767

0.1260

0.1235

0.1260

0.0283

0.0333

0.0024

0.0088

0.0076

0.00760.00240.0333

0.0088

7

5

Page 11: Center-Piece Subgraphs: Problem definition and Fast Solutions Hanghang Tong Christos Faloutsos Carnegie Mellon University

11

AND: Combine Scores

• Q: How to combine scores?

• A: Multiply• …= prob. 3 random

particles coincide on node j

Page 12: Center-Piece Subgraphs: Problem definition and Fast Solutions Hanghang Tong Christos Faloutsos Carnegie Mellon University

12

K_SoftAnd: Combine Scores

Generalization – SoftAND:

We want nodes close to k of Q (k<Q) query nodes.

Q: How to do that?

Page 13: Center-Piece Subgraphs: Problem definition and Fast Solutions Hanghang Tong Christos Faloutsos Carnegie Mellon University

13

K_SoftAnd: Combine Scores

Generalization – softAND:

We want nodes close to k of Q (k<Q) query nodes.

Q: How to do that?

A: Prob(at least k-out-of-Q will meet each other at j)

Page 14: Center-Piece Subgraphs: Problem definition and Fast Solutions Hanghang Tong Christos Faloutsos Carnegie Mellon University

14

K_SoftAnd: Relaxation of AND

Asking AND query? No Answer!

Disconnected Communities

Noise

Page 15: Center-Piece Subgraphs: Problem definition and Fast Solutions Hanghang Tong Christos Faloutsos Carnegie Mellon University

16

AND query vs. K_SoftAnd query

And Query 2_SoftAnd Query

x 1e-4

1 7

5

10

11

9 8

12

13

4

3

62

0.4505

0.1010

0.0710

0.1010

0.2267

0.1010

0.1010

0.4505

0.0710

0.07100.10100.1010

0.4505

1 7

5

10

11

9 8

12

13

4

3

62

0.0103

0.0046

0.0019

0.0046

0.0024

0.0046

0.0046

0.0103

0.0019

0.00190.00460.0046

0.0103

Page 16: Center-Piece Subgraphs: Problem definition and Fast Solutions Hanghang Tong Christos Faloutsos Carnegie Mellon University

17

1 7

5

10

11

9 8

12

13

4

3

62

0.0103

0.1617

0.1387

0.1617

0.0849

0.1617

0.1617

0.0103

0.1387

0.13870.16170.1617

0.0103

1_SoftAnd query = OR query

Page 17: Center-Piece Subgraphs: Problem definition and Fast Solutions Hanghang Tong Christos Faloutsos Carnegie Mellon University

18

Measuring Importance

Q1 Q2 Q3

Node 1Node 2Node 3Node 4Node 5Node 6Node 7Node 8Node 9Node 10Node 11Node 12Node 13

0.5767 0.0088 0.0088 0.1235 0.0076 0.0076 0.0283 0.0283 0.0283 0.0076 0.1235 0.0076 0.0088 0.5767 0.0088 0.0076 0.0076 0.1235 0.0088 0.0088 0.5767 0.0333 0.0024 0.1260 0.1260 0.0024 0.0333 0.1260 0.0333 0.0024 0.0333 0.1260 0.0024 0.0024 0.1260 0.0333 0.0024 0.0333 0.1260

0.45050.07100.22670.07100.45050.07100.45050.10100.10100.10100.10100.10100.1010

OR

0.01030.00190.01030.00190.01030.00190.00240.00460.00460.00460.00460.00460.0046

K_SoftAnd

Random

walk w

ith restart

And 2_SoftAnd

Individual Scores Combining Scores

Steady State Prob

Meeting Prob

Page 18: Center-Piece Subgraphs: Problem definition and Fast Solutions Hanghang Tong Christos Faloutsos Carnegie Mellon University

19

Roadmap

• Ceps Overview

• Q1: Goodness Score Calculation

• Q2: “Extract” Alg.

• Q3: Efficiency

• Experimental Results

• Conclusion

Page 19: Center-Piece Subgraphs: Problem definition and Fast Solutions Hanghang Tong Christos Faloutsos Carnegie Mellon University

20

• Goal– Maximize total scores and– ‘Appropriate’ Connections

• How to…”Extract” Alg.– Dynamic Programming– Greedy Alg.

• Pickup promising node• Find ‘best’ path

“Extract” Alg.

1

2

3

54

6

7

8

910

11

12

13

14 15 16

1

2

3

54

6

7

8

910

11

12

13

Page 20: Center-Piece Subgraphs: Problem definition and Fast Solutions Hanghang Tong Christos Faloutsos Carnegie Mellon University

21

Roadmap

• Ceps Overview

• Q1: Goodness Score Calculation

• Q2: “Extract” Alg.

• Q3: Efficiency

• Experimental Results

• Conclusion

Page 21: Center-Piece Subgraphs: Problem definition and Fast Solutions Hanghang Tong Christos Faloutsos Carnegie Mellon University

22

Graph Partition: Efficiency Issue

• Straightforward way– Q linear system: – linear to # of edge

• Observation– Skewed dist.

• How to…– Graph partition

Page 22: Center-Piece Subgraphs: Problem definition and Fast Solutions Hanghang Tong Christos Faloutsos Carnegie Mellon University

23

Roadmap

• Ceps Overview

• Q1: Goodness Score Calculation

• Q2: “Extract” Alg.

• Q3: Efficiency Issue

• Experimental Results

• Conclusion

Page 23: Center-Piece Subgraphs: Problem definition and Fast Solutions Hanghang Tong Christos Faloutsos Carnegie Mellon University

24

Experimental Setup

• Dataset– DBLP/authorship– Author-Paper– 315k nodes– 1,800k edges

• Evaluation Criteria– I Node Ratio

– I Edge Ratio

Page 24: Center-Piece Subgraphs: Problem definition and Fast Solutions Hanghang Tong Christos Faloutsos Carnegie Mellon University

25

Experimental Setup

• We want to check– Does the goodness criteria make sense?– Does “extract” alg. capture most of important

nodes/edge?– Efficiency

Page 25: Center-Piece Subgraphs: Problem definition and Fast Solutions Hanghang Tong Christos Faloutsos Carnegie Mellon University

26

Case Study: AND query

R. Agrawal Jiawei Han

V. Vapnik M. Jordan

H.V. Jagadish

Laks V.S. Lakshmanan

Heikki Mannila

Christos Faloutsos

Padhraic Smyth

Corinna Cortes

15 1013

1 1

6

1 1

4 Daryl Pregibon

10

2

11

3

16

Page 26: Center-Piece Subgraphs: Problem definition and Fast Solutions Hanghang Tong Christos Faloutsos Carnegie Mellon University

27

R. Agrawal Jiawei Han

V. Vapnik M. Jordan

H.V. Jagadish

Laks V.S. Lakshmanan

Umeshwar Dayal

Bernhard Scholkopf

Peter L. Bartlett

Alex J. Smola

1510

13

3 3

5 2 2

327

4

2_SoftAnd query

Statistic

database

Page 27: Center-Piece Subgraphs: Problem definition and Fast Solutions Hanghang Tong Christos Faloutsos Carnegie Mellon University

28

Evaluation of “Extract” Alg.

• 20 nodes

• 90%+ preserved

Budget (b)

Node Ratio 2 query nodes

3 query nodes

Page 28: Center-Piece Subgraphs: Problem definition and Fast Solutions Hanghang Tong Christos Faloutsos Carnegie Mellon University

29

Running Time vs. Quality for Fast Ceps

Running Time

Quality

~90% quality

6:1 speedup

Page 29: Center-Piece Subgraphs: Problem definition and Fast Solutions Hanghang Tong Christos Faloutsos Carnegie Mellon University

30

Roadmap

• Ceps Overview

• Q1: Goodness Score Calculation

• Q2: “Extract” Alg.

• Q3: Efficiency Issue

• Experimental Results

• Conclusion

Page 30: Center-Piece Subgraphs: Problem definition and Fast Solutions Hanghang Tong Christos Faloutsos Carnegie Mellon University

31

Conclusion

• Q1:How to measure the importance?• A1: RWR+K_SoftAnd• Q2: How to find connection subgraph?• A2:”Extract” Alg.• Q3:How to do it efficiently?• A3:Graph Partition (Fast Ceps)

– ~90% quality– 6:1 speedup