Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
Mining Frequent pattern in a set of graph using sub-graph Mining
- gSpan with closed graphAnkita Sambhare ([email protected])
Advisor: Dr. Carlos RiveroRochester Institute Of Technology
Background Research
CONCLUSIONS
Example
REFERENCES
Approach
gSpan includes mapping
each graph to a DFS
code, builds a
lexicographic ordering on
these codes, followed by
the construction of a
search tree based on the
lexicographic order. The
search tree is traversed
on the basis of the
number of edges in the
graph.
Figure2: A simple example of patterns mined from 2 graphs
It is very clear from the results that gSpan works faster than the
other branch and bound candidate graph generation algorithm
due to DFS codes introduced.
It is also clear that gspan mines more relevant subgraph patterns
as compared to gaston as it allows performing closed mining.
Graph Mining Domains:1. Frequent subgraph mining2. Approximate graph pattern mining3. Graph pattern summarization4. Graph classification5. Graph clustering6. Graph indexing7. Graph searching8. Correlated graph pattern mining9. Optimal graph pattern mining10. Graph kernels11. Link mining12. Web structure mining13. Workflow mining14. Biological network mining
1. X. Yan and J. Han. gSpan: Graph-based substructure pattern mining. UIUC-CS Tech. Report: R-2002-2296, (a 4-page short version in
REPLACE THIS BOX WITH YOUR ORGANIZATION’S
HIGH RESOLUTION LOGO
GoalExtract all the frequently
occurring patterns from a
set of graphs to study
most commonly occurring
behaviorally significant
patterns among the
different graphs. The
mined patterns can then
be used for further
analyzing the set of
graphs on the basis of its
similarities and identify its
significance.
RESULTS
FUTURE WORK
Building approximate graph mining on top of frequent
subgraph mining to add approximation to the mined
patterns which is required due to the noise and the
diversity of the data.
Handle complex data such as programs data where each
node is a complex structure
Steps:
1. DFS subscripting with rightmost extension
2. DFS codes
Algorithm
Algorithm (Contd.)
3. Lexicographical ordering of DFS codes
4. Minimum DFS Code
5. Perform dfs on DFS code tree
2770
10027
7361363
401 706
0
2000
4000
6000
8000
10000
12000
gSpan gaston
Outp
ut F
ragm
en
ts
Algorithm with minimum frequency
Gspan vs gaston on - 340 graphs (dense edges)
5% 10% 15%
0
2
4
6
8
10
12
14
16
5% 10% 15%
Run
Tim
e
Minimum Frequency
Gspan vs gaston on - 340 graphs (dense edges)
gspan gaston
1795
-1
460 460
225 225126 126
-200
0
200
400
600
800
1000
1200
1400
1600
1800
2000
gSpan gaston
Outp
ut F
ragm
en
ts
Algorithm with minimum frequency
Gspan vs gaston on - 10000 graphs (sparse edges)
5% 10% 15% 20%
0
5
10
15
20
25
30
35
5% 10% 15% 20%
Run
Tim
e
Minimum Frequency
Gspan vs gaston on - 10000 graphs (sparse edges)
gspan gaston