60
Finding Top-K Similar Graphs in Graph Database@ReadingCircl e M1 Ishikawa Yasutaka 1

Finding Top-k Similar Graphs in Graph Database @ ReadingCircle

Embed Size (px)

DESCRIPTION

研究室の論文紹介で書いたスライド

Citation preview

Page 1: Finding Top-k Similar Graphs in Graph Database @ ReadingCircle

Finding Top-K Similar Graphs in Graph Database@ReadingCircle

M1 Ishikawa Yasutaka1

Page 2: Finding Top-k Similar Graphs in Graph Database @ ReadingCircle

About this paperA paper in “graph theory”

About “graph similarity query”

Proposing new technique for accurate answer and reducing computational cost

Proceedings of the 15th International Conference on Extending Database Technology - EDBT '12

Zhu, Yuanyuan・Qin, Lu・Yu, Jeffrey Xu・Cheng, Hong

2

Page 3: Finding Top-k Similar Graphs in Graph Database @ ReadingCircle

Outline1. Back ground of graph theory

2. Introduction

3. Problem statement

4. The framework

5. Pruning without indexing

6. Pruning with indexing

7. Performance studies

8. Conclusion3

Page 4: Finding Top-k Similar Graphs in Graph Database @ ReadingCircle

Outline1. Back ground of graph theory

2. Introduction

3. Problem statement

4. The framework

5. Pruning without indexing

6. Pruning with indexing

7. Performance studies

8. Conclusion4

Page 5: Finding Top-k Similar Graphs in Graph Database @ ReadingCircle

What is “graph”?

5

Graph is denoted by 𝑔 = 𝑉, 𝐸, 𝑙

𝑉 is a set of vertices

𝐸 ⊆ V × 𝑉 is the set of edges

𝑙 is a labeling function, 𝑙: 𝑉 → 𝑉 𝑉 is a set of labels

In this paper, edges of graph have no weight

Page 6: Finding Top-k Similar Graphs in Graph Database @ ReadingCircle

Subgraph・Supergraph

6

Given two graphs 𝑔 and 𝑔′ , If 𝑔 ⊂ 𝑔′,

𝑔 is subgraph of 𝑔′

𝑔′ is supergraph of 𝑔

Supergraph

Subgraph

Page 7: Finding Top-k Similar Graphs in Graph Database @ ReadingCircle

Maximum Common Subgraph

7

If 𝑔 is a common subgraph of 𝑔1 and 𝑔2 and there is no other common subgraph 𝑔′ of 𝑔1 and 𝑔2,such that 𝐸 𝑔′ > |𝐸(𝑔)|, 𝑔𝑟𝑎𝑝ℎ 𝑔 is a maximum common subgraph of two graphs

This calculation is NP-hard

𝑔𝑟𝑎𝑝ℎ 𝑔1

𝑔𝑟𝑎𝑝ℎ 𝑔2𝑚𝑐𝑠 𝑞

Page 8: Finding Top-k Similar Graphs in Graph Database @ ReadingCircle

Bipartite graph

8

A graph whose vertices can be devided into two disjoint sets 𝑈 and 𝑉

𝑈 and 𝑉 are each independent sets

𝑈 𝑉

Page 9: Finding Top-k Similar Graphs in Graph Database @ ReadingCircle

Matching of bipartite graph

9

If each edge has no same vertices, the edge set M is called matching

𝑈 𝑉

Page 10: Finding Top-k Similar Graphs in Graph Database @ ReadingCircle

Outline1. Background of graph theory

2. Introduction

3. Problem statement

4. The framework

5. Pruning without indexing

6. Pruning with indexing

7. Performance studies

8. Conclusion10

Page 11: Finding Top-k Similar Graphs in Graph Database @ ReadingCircle

Graph query processing(1)Using graph as query to graph Database

It has attracted much attention in recent year

Image retrieval

Chemical compound structure search

Query graph

GraphDB

11result graphs

querying

Page 12: Finding Top-k Similar Graphs in Graph Database @ ReadingCircle

Graph query processing(2)Mainly falling into two categories

Subgraph containment search

Identify a set of graphs that contain a query graph

Supergraph containment search

Identify a set of graphs that are contained by a query graph

Besides exact subgraph/supergraph containment query, some studies allow a small number of edgesor nodes missing in the query result

→graph similarity search is important

12

Page 13: Finding Top-k Similar Graphs in Graph Database @ ReadingCircle

Graph similarity search

13

Main theme of this paper

Search for the similarity of a query graph and each graph of Database

“Top-k similar graphs “ means k graphs that is most similar to a query graph

Query graph

12

3

Top-3 similar graph

Page 14: Finding Top-k Similar Graphs in Graph Database @ ReadingCircle

Existing graph similarity search(1)

14

Two kinds of graph similarity search in related works

Subgraph similarity search

H.Shang,X.Lin,Y.Zhang,J.X.Yu,andW.Wang.Connected substructure similarity search. In SIGMOD, pages 903–914, 2010.

X.Yan,P.Yu,andJ.Han.Substructuresimilaritysearchingraphdatabases. In SIGMOD, pages 766–777, 2005.

Supergraph similarity search

H.Shang,K.Zhu,X.Lin,Y.Zhang,andR.Ichise.Similaritysearch on supergraph containment. In ICDE, pages 637–648, 2010

To calculate similarity, it is needed to define the distance of graphs:𝑑𝑖𝑠𝑡(𝑞, 𝑔)

Page 15: Finding Top-k Similar Graphs in Graph Database @ ReadingCircle

Existing graph similarity search(2)

15

Subgraph similarity search

𝑑𝑖𝑠𝑡 𝑞, 𝑔 = 𝐸 𝑞 − 𝐸 𝑚𝑐𝑠 𝑞, 𝑔

Supergraph similarity search

𝑑𝑖𝑠𝑡 𝑞, 𝑔 = 𝐸 𝑔 − 𝐸 𝑚𝑐𝑠 𝑞, 𝑔

※(maybe) these 𝑑𝑖𝑠𝑡 𝑞, 𝑔 don’t satisfy the axiom of metric space

𝑑𝑖𝑠𝑡 𝑞, 𝑔 ≠ 𝑑𝑖𝑠𝑡(𝑔, 𝑞)

Page 16: Finding Top-k Similar Graphs in Graph Database @ ReadingCircle

Ex:existing similarity search(1)

16

Query 𝑞 and sample graph database 𝐷 ={𝑔1, 𝑔2, 𝑔3}

Bold edges mean the MCS of 𝑞 and each 𝑔

B

C

C A C C

B

Query q

B

C

C D C C

B

𝑔𝑟𝑎𝑝ℎ 𝑔2 ∈ 𝐷

C B B C

𝑔𝑟𝑎𝑝ℎ 𝑔1 ∈ 𝐷

B

C

C A

AA

AA

A C C

B C

C

𝑔𝑟𝑎𝑝ℎ 𝑔3 ∈ 𝐷

Page 17: Finding Top-k Similar Graphs in Graph Database @ ReadingCircle

Ex:existing similarity search(2)

17

If we use subgraph query (𝑑𝑖𝑠𝑡 𝑞, 𝑔 = 𝐸 𝑞 −

𝐸 𝑚𝑐𝑠 𝑞, 𝑔 ),𝑔3 will be returned as answer

𝑑𝑖𝑠𝑡 𝑞, 𝑔3 = 7 − 6 = 1

B

C

C A C C

B

Query q

B

C

C D C C

B

𝑔𝑟𝑎𝑝ℎ 𝑔2 ∈ 𝐷

C B B C

𝑔𝑟𝑎𝑝ℎ 𝑔1 ∈ 𝐷

B

C

C A

AA

AA

A C C

B C

C

𝑔𝑟𝑎𝑝ℎ 𝑔3 ∈ 𝐷

Page 18: Finding Top-k Similar Graphs in Graph Database @ ReadingCircle

Ex:existing similarity search(3)

18

If we use supergraph query (𝑑𝑖𝑠𝑡 𝑞, 𝑔 = 𝐸 𝑔 −

𝐸 𝑚𝑐𝑠 𝑞, 𝑔 ), 𝑔1 will be returned as answer

𝑑𝑖𝑠𝑡 𝑞, 𝑔1 = 3 − 2 = 1

B

C

C A C C

B

Query q

B

C

C D C C

B

𝑔𝑟𝑎𝑝ℎ 𝑔2 ∈ 𝐷

C B B C

𝑔𝑟𝑎𝑝ℎ 𝑔1 ∈ 𝐷

B

C

C A

AA

AA

A C C

B C

C

𝑔𝑟𝑎𝑝ℎ 𝑔3 ∈ 𝐷

Page 19: Finding Top-k Similar Graphs in Graph Database @ ReadingCircle

Ex:existing similarity search(4)

19

But, the best answer should be 𝑔2, from user’s perspective

These way to calculate 𝑑𝑖𝑠𝑡 is not good

B

C

C A C C

B

Query q

B

C

C D C C

B

𝑔𝑟𝑎𝑝ℎ 𝑔2 ∈ 𝐷

C B B C

𝑔𝑟𝑎𝑝ℎ 𝑔1 ∈ 𝐷

B

C

C A

AA

AA

A C C

B C

C

𝑔𝑟𝑎𝑝ℎ 𝑔3 ∈ 𝐷

Page 20: Finding Top-k Similar Graphs in Graph Database @ ReadingCircle

Main contributions of this paper

20

1. Studying top-k graph similarity query processing based on new MCS based similarity measure

2. Deriving several distance lower bounds(without and with index) to reduce the number of MCS computations

3. Conducting extensive performance studies on a real dataset to test the performance of their algorithms

Page 21: Finding Top-k Similar Graphs in Graph Database @ ReadingCircle

Outline1. Background of graph theory

2. Introduction

3. Problem statement

4. The framework

5. Pruning without indexing

6. Pruning with indexing

7. Performance studies

8. Conclusion21

Page 22: Finding Top-k Similar Graphs in Graph Database @ ReadingCircle

Definitions(1)

22

In this paper, they define the 𝑑𝑖𝑠𝑡(𝑞, 𝑔) like this

𝑑𝑖𝑠𝑡 𝑞, 𝑔 = 𝐸 𝑞 + 𝐸 𝑔 − 2 × 𝐸 𝑚𝑐𝑠 𝑞, 𝑔

※This 𝑑𝑖𝑠𝑡 𝑞, 𝑔 (maybe) satisfies the axiom of metric space

𝑥 = 𝑦 ⇔ 𝑑𝑖𝑠𝑡 𝑥, 𝑦 = 0

𝑑𝑖𝑠𝑡 𝑦, 𝑥 = 𝑑𝑖𝑠𝑡(𝑥, 𝑦)

𝑑𝑖𝑠𝑡 𝑥, 𝑦 ≥ 0

𝑑𝑖𝑠𝑡 𝑥, 𝑦 + 𝑑𝑖𝑠𝑡 𝑦, 𝑧 ≥ 𝑑𝑖𝑠𝑡(𝑥, 𝑧)

This is important in later

Page 23: Finding Top-k Similar Graphs in Graph Database @ ReadingCircle

Definition(2)

23

In this paper, they allow MCS of two graphs to be disconnected

It cat potentially capture more common substructures of two graphs

It also can evaluate the structure similarity of two graphs more globally

Page 24: Finding Top-k Similar Graphs in Graph Database @ ReadingCircle

Ex:𝒅𝒊𝒔𝒕(𝒒, 𝒈) of this paper(1)

24

Query 𝑞 and sample graph database 𝐷 = {𝑔1, 𝑔2}

Bold edges mean the common edges of 𝑞 and each 𝑔

C

C

B

B AA

𝑔𝑟𝑎𝑝ℎ 𝑔1

A

C

C

C

B

B

C

C

B

B A

C

C

C

B

BC

C

C

B

B A

𝑔𝑟𝑎𝑝ℎ 𝑔2𝑞𝑢𝑒𝑟𝑦 𝑞

Page 25: Finding Top-k Similar Graphs in Graph Database @ ReadingCircle

Ex:𝒅𝒊𝒔𝒕(𝒒, 𝒈) of this paper(2)

25

If we require MCS to be connected, 𝑔1 will be returned as the answer

𝑑𝑖𝑠𝑡 𝑞, 𝑔1 = 12 + 6 − 2 × 6 = 6

𝑑𝑖𝑠𝑡 𝑞, 𝑔2 = 12 + 12 − 2 × 5 = 14

C

C

B

B AA

𝑔𝑟𝑎𝑝ℎ 𝑔1

A

C

C

C

B

B

C

C

B

B A

C

C

C

B

BC

C

C

B

B A

𝑔𝑟𝑎𝑝ℎ 𝑔2𝑞𝑢𝑒𝑟𝑦 𝑞

Page 26: Finding Top-k Similar Graphs in Graph Database @ ReadingCircle

Ex:𝒅𝒊𝒔𝒕(𝒒, 𝒈) of this paper(3)

26

If we allow MCS to be disconnected, 𝑔2 will be returned as the answer

𝑑𝑖𝑠𝑡 𝑞, 𝑔1 = 12 + 6 − 2 × 6 = 6

𝑑𝑖𝑠𝑡 𝑞, 𝑔2 = 12 + 12 − 2 × 10 = 4

𝑔2 is desired result for usersC

C

B

B AA

𝑔𝑟𝑎𝑝ℎ 𝑔1

A

C

C

C

B

B

C

C

B

B A

C

C

C

B

BC

C

C

B

B A

𝑔𝑟𝑎𝑝ℎ 𝑔2𝑞𝑢𝑒𝑟𝑦 𝑞

Page 27: Finding Top-k Similar Graphs in Graph Database @ ReadingCircle

Outline1. Background of graph theory

2. Introduction

3. Problem statement

4. The framework

5. Pruning without indexing

6. Pruning with indexing

7. Performance studies

8. Conclusion27

Page 28: Finding Top-k Similar Graphs in Graph Database @ ReadingCircle

Pruning strategy

28

As mentioned previously, computing MCS is NP-hard problem

In this paper, they derived the lower bound of MCS to reduce the number of MCS computations

They didn’t make MCS computation faster

If 𝑑𝑖𝑠𝑡(𝑞, 𝑔) is no less than the largest distance of the current top-k answers, 𝑔 is not a top-k answer and can be pruned safety

Page 29: Finding Top-k Similar Graphs in Graph Database @ ReadingCircle

Based algorithm(1)

29

Using max-heap Α and min-heap ℋ

Page 30: Finding Top-k Similar Graphs in Graph Database @ ReadingCircle

Based algorithm(2)

30

If 𝑑𝑖𝑠𝑡(𝑞, 𝑔) is smaller than the top value of current top-k answer, the 𝑑𝑖𝑠𝑡(𝑞, 𝑔) is computed and compared with the current top value again

Page 31: Finding Top-k Similar Graphs in Graph Database @ ReadingCircle

Outline1. Background of graph theory

2. Introduction

3. Problem statement

4. The framework

5. Pruning without indexing

6. Pruning with indexing

7. Performance studies

8. Conclusion31

Page 32: Finding Top-k Similar Graphs in Graph Database @ ReadingCircle

Edge frequency based lower bound

32

Finding the lower bound of 𝑑𝑖𝑠𝑡(𝑞, 𝑔) is equivalent to finding the upper bound of |𝐸(𝑚𝑐𝑠 𝑞, 𝑔 )|

Denote the set of the distinct edges in g as 𝐸𝑑(𝑔)

Denote Frequency of e as 𝑓(𝑒, 𝑔)

𝑒𝑚𝑐𝑠1 𝑞, 𝑔 = 𝑒∈𝐸𝑑(𝑞)∪𝐸𝑑(𝑔)min{𝑓 𝑒, 𝑞 , 𝑓(𝑒, 𝑔)}

𝑑𝑖𝑠𝑡1 𝑞, 𝑔 = 𝐸 𝑞 + 𝐸 𝑔 − 2 × 𝑒𝑚𝑐𝑠1(𝑞, 𝑔)

Page 33: Finding Top-k Similar Graphs in Graph Database @ ReadingCircle

Ex:using the 𝒅𝒊𝒔𝒕𝟏(𝒒, 𝒈) (1)

33

The frequency of edge(A,C),(B,C),(C,C) are 4,3,6

𝑒𝑚𝑐𝑠1 𝑞, 𝑔1 = 4 + 3 + 5 = 12

𝑑𝑖𝑠𝑡1 𝑞, 𝑔1 = 13 + 12 − 2 × 12 = 1A

CCCCCC

C

C B A

A

𝑔𝑟𝑎𝑝ℎ 𝑔1

CCCCCC

C

C B A

A

C

C

𝑔𝑟𝑎𝑝ℎ 𝑔2

B

CC C

CCCCCCC

AA

A

𝑞𝑢𝑒𝑟𝑦 𝑞

Page 34: Finding Top-k Similar Graphs in Graph Database @ ReadingCircle

Ex:using the 𝒅𝒊𝒔𝒕𝟏(𝒒, 𝒈) (2)

34

𝑒𝑚𝑐𝑠1 𝑞, 𝑔2 = 3 + 3 + 6 = 12

𝑑𝑖𝑠𝑡1 𝑞, 𝑔2 = 13 + 13 − 2 × 12 = 2

In fact, these lower bound are not tight compared to the actual 𝑑𝑖𝑠𝑡 A

CCCCCC

C

C B A

A

𝑔𝑟𝑎𝑝ℎ 𝑔1

CCCCCC

C

C B A

A

C

C

𝑔𝑟𝑎𝑝ℎ 𝑔2

B

CC C

CCCCCCC

AA

A

𝑞𝑢𝑒𝑟𝑦 𝑞

Page 35: Finding Top-k Similar Graphs in Graph Database @ ReadingCircle

Adjacency List Based Lower Bound(1)

35

Constracting bipartite graph 𝐵(𝑞, 𝑔)

For each pair of nodes 𝑢 ∈ 𝑉(𝑞) and 𝑣 ∈ 𝑉(𝑔), there is an edge between 𝑏(𝑢) and 𝑏 𝑣 if 𝑙 𝑢 =𝑙 𝑣

𝐿(𝑎𝑑𝑗(𝑢)) is a multiset consisting of all labels in the adjacent nodes of 𝑢

AC

BA

𝑢

𝐿 𝑎𝑑𝑗 𝑢 = {𝐴, 𝐴, 𝐵}

Page 36: Finding Top-k Similar Graphs in Graph Database @ ReadingCircle

Adjacency List Based Lower Bound(2)

36

The weight of edges is defined as 𝑤 𝑏 𝑢 , 𝑏 𝑣 =

|𝐿(𝑎𝑑𝑗(𝑢)) ∩ 𝐿(𝑎𝑑𝑗(𝑣))|

𝑀(𝑞, 𝑔) is the maximum weighted bipartite matching

𝑒𝑚𝑐𝑠2 𝑞, 𝑔 =1

2 𝑏 𝑢 ,𝑏 𝑣 ∈𝑀 𝑞,𝑔 𝑤 𝑏 𝑢 , 𝑏 𝑣

𝑑𝑖𝑠𝑡2 𝑞, 𝑔 = 𝐸 𝑞 + 𝐸 𝑔 − 2 × 𝑒𝑚𝑐𝑠2 𝑞, 𝑔

Page 37: Finding Top-k Similar Graphs in Graph Database @ ReadingCircle

Bipartite graph(repeated)

37

A graph whose vertices can be devided into two disjoint sets 𝑈 and 𝑉

𝑈 and 𝑉 are each independent sets

𝑈 𝑉

Page 38: Finding Top-k Similar Graphs in Graph Database @ ReadingCircle

Matching of bipartite graph(repeated)

38

If each edge has no same vertices, the edge set M is called matching

𝑈 𝑉

Page 39: Finding Top-k Similar Graphs in Graph Database @ ReadingCircle

Ex:using the 𝒅𝒊𝒔𝒕𝟐(𝒒, 𝒈) (1)

39

𝑒𝑚𝑐𝑠2 𝑞, 𝑔1 = 2 + 2 + 2 + 1 ÷ 2 = 3.5

𝑑𝑖𝑠𝑡2 𝑞, 𝑔1 = 4 + 5 − 2 × 3.5 = 2

C

C

B A

A

𝑔𝑟𝑎𝑝ℎ 𝑔1

CC

B

A

𝑞𝑢𝑒𝑟𝑦 𝑞

A

A

A

B

B

C

CC

C

2

2

2

1

Page 40: Finding Top-k Similar Graphs in Graph Database @ ReadingCircle

Ex:using the 𝒅𝒊𝒔𝒕𝟐(𝒒, 𝒈) (2)

40

If we use 𝑒𝑚𝑐𝑠1, 𝑒𝑚𝑐𝑠1 = 1 + 1 + 1 + 1 = 4

𝑑𝑖𝑠𝑡1 𝑞, 𝑔1 = 4 + 5 − 2 × 4 = 1

C

C

B A

A

𝑔𝑟𝑎𝑝ℎ 𝑔1

CC

B

A

𝑞𝑢𝑒𝑟𝑦 𝑞

A

A

A

B

B

C

CC

C

2

2

2

1

Page 41: Finding Top-k Similar Graphs in Graph Database @ ReadingCircle

Ex:using the 𝒅𝒊𝒔𝒕𝟐(𝒒, 𝒈) (3)

41

Given two graphs 𝑞, 𝑔,we have 𝑑𝑖𝑠𝑡2(𝑞, 𝑔) ≥

𝑑𝑖𝑠𝑡1(𝑞, 𝑔)

C

C

B A

A

𝑔𝑟𝑎𝑝ℎ 𝑔1

CC

B

A

𝑞𝑢𝑒𝑟𝑦 𝑞

A

A

A

B

B

C

CC

C

2

2

2

1

Page 42: Finding Top-k Similar Graphs in Graph Database @ ReadingCircle

Algorithm using 𝒅𝒊𝒔𝒕𝟏, 𝒅𝒊𝒔𝒕𝟐

42

The computational cost of are 𝑑𝑖𝑠𝑡 > 𝑑𝑖𝑠𝑡2 > 𝑑𝑖𝑠𝑡1

Using 𝑑𝑖𝑠𝑡1 as possible

Page 43: Finding Top-k Similar Graphs in Graph Database @ ReadingCircle

Outline1. Background of graph theory

2. Introduction

3. Problem statement

4. The framework

5. Pruning without indexing

6. Pruning with indexing

7. Performance studies

8. Conclusion43

Page 44: Finding Top-k Similar Graphs in Graph Database @ ReadingCircle

Triangle property of distance

44

Given three graph 𝑔1, 𝑔2, 𝑔3, 𝑑𝑖𝑠𝑡 𝑔1, 𝑔3 ≤𝑑𝑖𝑠𝑡 𝑔1, 𝑔2 + 𝑑𝑖𝑠𝑡 𝑔2, 𝑔3 If 𝑔2 and 𝑔3 are very near, 𝑑𝑖𝑠𝑡(𝑔1, 𝑔2)~dist(𝑔2, 𝑔3)

If we know 𝑑𝑖𝑠𝑡(𝑔, 𝑔′), we can compute these lower bound

𝑑𝑖𝑠𝑡3 𝑞, 𝑔 𝑔′ = 𝑑𝑖𝑠𝑡 𝑞, 𝑔′ − 𝑑𝑖𝑠𝑡 𝑔, 𝑔′

𝑑𝑖𝑠𝑡4 𝑞, 𝑔 𝑔′ = 𝑑𝑖𝑠𝑡 𝑞, 𝑔′ − 𝑑𝑖𝑠𝑡(𝑔, 𝑔′)

Page 45: Finding Top-k Similar Graphs in Graph Database @ ReadingCircle

Indexing

45

The 𝑑𝑖𝑠𝑡(𝑔, 𝑔′) can be precomputed

But, computing all the pair need to do 𝑂(|𝐷|2)MCS computations

Define a set of groups 𝐼 = {𝐺1, 𝐺2, … , 𝐺|𝐼|}, where

𝐺𝑖 ⊆ 𝐷, and 𝐺1 ∪ 𝐺2 ∪⋯∪ 𝐺 𝐼 = 𝐷

There is a center graph 𝑐𝑖 ∈ 𝐺𝑖

Precompute the 𝑑𝑖𝑠𝑡(𝑔, 𝑐𝑖), 𝑔 ∈ 𝐺𝑖

𝑔6

𝑔4 𝑔2𝑔7𝑔5𝑔1

𝑔3𝐺1 𝐺2

Page 46: Finding Top-k Similar Graphs in Graph Database @ ReadingCircle

Algorithm using 𝒅𝒊𝒔𝒕𝟑, 𝒅𝒊𝒔𝒕𝟒,index

46

If we get the

real 𝑑𝑖𝑠𝑡(𝑞, 𝑔), update

lower bound 𝑑𝑖𝑠𝑡 by

using it

Page 47: Finding Top-k Similar Graphs in Graph Database @ ReadingCircle

Ex:algorithm with index

47

Page 48: Finding Top-k Similar Graphs in Graph Database @ ReadingCircle

Three indexing strategy(1)

48

DPIndex

Given the number of 𝑚, randomly pick 𝑚 graphs as 𝑚center nodes for group. For each non-center graph 𝑔 ∈𝐷,assign it to the nearest center

Each graph only belongs to one group

Page 49: Finding Top-k Similar Graphs in Graph Database @ ReadingCircle

Three indexing strategy(2)

49

OPIndex

After selecting 𝑚 graphs in 𝐷 as centers, assign each non-center graph 𝑔 ∈ 𝐷 to the 𝑙 nealest centers

Allows each graph to belong to multiple groups

Page 50: Finding Top-k Similar Graphs in Graph Database @ ReadingCircle

Three indexing strategy(3)

50

GSIndex

Treat each graph in 𝐷 as the center

For each center, find its nearest 𝑙 graphs in 𝐷, and putting the 𝑙 + 1 graphs together as group

Page 51: Finding Top-k Similar Graphs in Graph Database @ ReadingCircle

Outline1. Background of graph thoery

2. Introduction

3. Problem statement

4. The framework

5. Pruning without indexing

6. Pruning with indexing

7. Performance studies

8. Conclusion51

Page 52: Finding Top-k Similar Graphs in Graph Database @ ReadingCircle

Overview of experiments

52

Similarity measures evaluation

Show why the query results of subgraph/supergraphsimilarity query are not good

Query performance evaluation

Compare with noIndex and SeqScan, and compare their three indexing techniques

Indexing cost evaluation

Compare the cost of their three indexing

Page 53: Finding Top-k Similar Graphs in Graph Database @ ReadingCircle

environment

53

All the algorithms were implemented using Visual C++ 2005

Tested on a PC with 2.66GHz CPU and 3.43GB memory running Windows XP

Page 54: Finding Top-k Similar Graphs in Graph Database @ ReadingCircle

parameters

54

They evaluate their approaches by varying five parameters

𝑘:top-k value

|𝑉(𝑞)|:the size of query graph

𝐷 :the number of graphs in graph database

𝑚:the number of groups m used in DPIndex and OPIndex

𝑙:the maximum number of groups l

Page 55: Finding Top-k Similar Graphs in Graph Database @ ReadingCircle

Similarity measures comparison

55

Experiments in three types

Subsim: 𝐸 𝑞 − 𝐸 𝑚𝑐𝑠 𝑞, 𝑔

Supersim: 𝐸 𝑔 − 𝐸 𝑚𝑐𝑠 𝑞, 𝑔

Fullsim: 𝐸 𝑞 + 𝐸 𝑔 − 2 × 𝐸 𝑚𝑐𝑠 𝑞, 𝑔

The near the answers and

query graph in size,

the better the answers are

Page 56: Finding Top-k Similar Graphs in Graph Database @ ReadingCircle

Power of pruning strategy

56

Seqscan needs around 7000 MCS computation for graph with size larger than 10

noIndex needs no more than 500

Page 57: Finding Top-k Similar Graphs in Graph Database @ ReadingCircle

Scalability testing

57

Comparing their three index teqnique

Page 58: Finding Top-k Similar Graphs in Graph Database @ ReadingCircle

Index testing

58

Comparing the cost of three index teqnique

Page 59: Finding Top-k Similar Graphs in Graph Database @ ReadingCircle

Outline1. Background of graph theory

2. Introduction

3. Problem statement

4. The framework

5. Pruning without indexing

6. Pruning with indexing

7. Performance studies

8. Conclusion59

Page 60: Finding Top-k Similar Graphs in Graph Database @ ReadingCircle

Conclusion

60

Existing solutions:subgraph/supergraph similarity search cannot be used to solve problem properly

They introduced a new graph distance using the maximum common subgraph(MCS)

In order to reduce the number of MCS computation, they proposed two distance lower bounds

They further introduced a triangle property to lower bound

They conducted extensive performance studies