11 Graph Pattern Mining

Embed Size (px)

Citation preview

  • 7/28/2019 11 Graph Pattern Mining

    1/71

    1

    Data Mining:

    Concepts and Techniques Chapter 9

    Graph mining: Part IGraph Pattern Mining

    Jiawei Han and Micheline KamberDepartment of Computer Science

    University of Illinois at Urbana-Champaign

    www.cs.uiuc.edu/~hanj

    2006 Jiawei Han and Micheline Kamber. All rights reserved.

    http://www.cs.uiuc.edu/~hanjhttp://www.cs.uiuc.edu/~hanj
  • 7/28/2019 11 Graph Pattern Mining

    2/71

    2

    Graph Mining

    Graph Pattern Mining

    Mining Frequent Subgraph Patterns

    Impact on Graph Search I: Graph Indexing

    Impact on Graph Search II: Graph SimilaritySearch

    Constrained Graph Pattern Mining

    Graph Classification

    Graph Clustering

    Summary

  • 7/28/2019 11 Graph Pattern Mining

    3/71

    3

    Why Graph Mining?

    Graphs are ubiquitous

    Chemical compounds (Cheminformatics)

    Protein structures, biological pathways/networks (Bioinformactics)

    Program control flow, traffic flow, and workflow analysis

    XML databases, Web, and social network analysis Graph is a general model

    Trees, lattices, sequences, and items are degenerated graphs

    Diversity of graphs

    Directed vs. undirected, labeled vs. unlabeled (edges & vertices),

    weighted, with angles & geometry (topological vs. 2-D/3-D)

    Complexity of algorithms: many problems are of high

    complexity

  • 7/28/2019 11 Graph Pattern Mining

    4/71

    4

    Graph, Graph, Everywhere

    Aspirin Yeast protein interaction network

    fromH.

    JeongetalNature411,

    41(2

    001)

    Internet Co-author network

  • 7/28/2019 11 Graph Pattern Mining

    5/71

    5

    Graph Pattern Mining

    Frequentsubgraphs

    A (sub)graph is f requent if its support(occurrence

    frequency) in a given dataset is no less than a

    minimum supportthreshold

    Applications of graph pattern mining

    Mining biochemical structures

    Program control flow analysis

    Mining XML structures or Web communities

    Building blocks for graph classification, clustering,

    compression, comparison, and correlation analysis

  • 7/28/2019 11 Graph Pattern Mining

    6/71

    6

    Example: Frequent Subgraphs

    GRAPH DATASET

    FREQUENT PATTERNS

    (MIN SUPPORT IS 2)

    (A) (B) (C)

    (1) (2)

  • 7/28/2019 11 Graph Pattern Mining

    7/71

  • 7/28/2019 11 Graph Pattern Mining

    8/718

    Graph Mining Algorithms

    Incomplete beam search Greedy (Subdue)

    Inductive logic programming (WARMR)

    Graph theory-based approaches

    Apriori-based approach

    Pattern-growth approach

  • 7/28/2019 11 Graph Pattern Mining

    9/719

    SUBDUE (Holder et al. KDD94)

    Start with single vertices

    Expand best substructures with a new edge

    Limit the number of best substructures

    Substructures are evaluated based on theirability to compress input graphs

    Using minimum description length (DL)

    Best substructure S in graph G minimizes:

    DL(S) + DL(G\S)

    Terminate until no new substructure is discovered

  • 7/28/2019 11 Graph Pattern Mining

    10/7110

    WARMR (Dehaspe et al. KDD98)

    Graphs are represented by Datalog facts

    atomel(C, A1, c), bond (C, A1, A2, BT),

    atomel(C, A2, c) : a carbon atom bound to a

    carbon atom with bond type BT

    WARMR: the first general purpose ILP system

    Level-wise search

    Simulate Apriori for frequent pattern discovery

  • 7/28/2019 11 Graph Pattern Mining

    11/7111

    Frequent Subgraph Mining Approaches

    Apriori-based approach

    AGM/AcGM: Inokuchi, et al. (PKDD00)

    FSG: Kuramochi and Karypis (ICDM01)

    PATH#: Vanetik and Gudes (ICDM02,

    ICDM04) FFSM: Huan, et al. (ICDM03)

    Pattern growth approach

    MoFa, Borgelt and Berthold (ICDM02) gSpan: Yan and Han (ICDM02)

    Gaston: Nijssen and Kok (KDD04)

  • 7/28/2019 11 Graph Pattern Mining

    12/7112

    Properties of Graph Mining Algorithms

    Search order

    breadth vs. depth

    Generation of candidate subgraphs

    apriori vs. pattern growth

    Elimination of duplicate subgraphs

    passive vs. active

    Support calculation

    embedding store or not

    Discover order of patterns

    path tree graph

  • 7/28/2019 11 Graph Pattern Mining

    13/7113

    Apriori-Based Approach

    G

    G1

    G2

    Gn

    k-edge (k+1)-edge

    G

    G

    JOIN

  • 7/28/2019 11 Graph Pattern Mining

    14/71

    14

    Apriori-Based, Breadth-First Search

    AGM (Inokuchi, et al. PKDD00) generates new graphs with one more node

    Methodology: breadth-search, joining two graphs

    FSG (Kuramochi and Karypis ICDM01)

    generates new graphs with one more edge

  • 7/28/2019 11 Graph Pattern Mining

    15/71

    15

    PATH (Vanetik and Gudes ICDM02, 04)

    Apriori-based approach

    Building blocks: edge-disjoint path

    A graph with 3 edge-disjoint

    paths

    construct frequent paths construct frequent graphs with

    2 edge-disjoint paths construct graphs with k+1

    edge-disjoint paths fromgraphs with k edge-disjoint

    paths repeat

  • 7/28/2019 11 Graph Pattern Mining

    16/71

    16

    FFSM (Huan, et al. ICDM03)

    Represent graphs using canonical adjacency matrix(CAM)

    Join two CAMs or extend a CAM to generate a newgraph

    Store the embeddings of CAMs

    All of the embeddings of a pattern in the database

    Can derive the embeddings of newly generatedCAMs

  • 7/28/2019 11 Graph Pattern Mining

    17/71

    17

    Pattern Growth Method

    G

    G1

    G2

    Gn

    k-edge

    (k+1)-edge

    (k+2)-edge

    duplicategraph

  • 7/28/2019 11 Graph Pattern Mining

    18/71

    18

    MoFa (Borgelt and Berthold ICDM02)

    Extend graphs by adding a new edge

    Store embeddings of discovered frequent graphs

    Fast support calculation

    Also used in other later developed algorithms

    such as FFSM and GASTON

    Expensive Memory usage

    Local structural pruning

  • 7/28/2019 11 Graph Pattern Mining

    19/71

    19

    GSPAN (Yan and Han ICDM02)

    Right-Most Extension

    Theorem: Completeness

    The Enumeration of Graphsusing Right-most Extension isCOMPLETE

  • 7/28/2019 11 Graph Pattern Mining

    20/71

    20

    DFS Code

    Flatten a graph into a sequence using depth first

    search

    0

    1

    2

    3 4

    e0: (0,1)

    e1: (1,2)

    e2: (2,0)

    e3: (2,3)

    e4: (3,1)

    e5: (2,4)

  • 7/28/2019 11 Graph Pattern Mining

    21/71

  • 7/28/2019 11 Graph Pattern Mining

    22/71

    22

    DFS Code Extension

    Let a be the minimum DFS code of a graph G and b be

    a non-minimum DFS code ofG. For any DFS code dgenerated from b by one right-most extension,

    (i) d is not a minimum DFS code,

    (ii) min_dfs(d) cannot be extended from b, and

    (iii) min_dfs(d) is either less than a or can beextended from a.

    THEOREM [ RIGHT-EXTENSION ]

    The DFS code of a graph extended from aNon-minimum DFS code is NOT MINIMUM

  • 7/28/2019 11 Graph Pattern Mining

    23/71

    23

    GASTON (Nijssen and Kok KDD04)

    Extend graphs directly Store embeddings

    Separate the discovery of different types of

    graphs

    path tree graph

    Simple structures are easier to mine and

    duplication detection is much simpler

  • 7/28/2019 11 Graph Pattern Mining

    24/71

    24

    Graph Pattern Explosion Problem

    If a graph is frequent, all of its subgraphs are

    frequent the Apriori property

    An n-edge frequent graph may have 2n

    subgraphs

    Among 422 chemical compounds which are

    confirmed to be active in an AIDS antiviral

    screen dataset, there are 1,000,000 frequent

    graph patterns if the minimum support is 5%

  • 7/28/2019 11 Graph Pattern Mining

    25/71

    25

    Closed Frequent Graphs

    Motivation: Handling graph pattern explosion

    problem

    Closed frequent graph

    A frequent graph G is closedif there exists no

    supergraph of G that carries the same supportas G

    If some of Gs subgraphs have the same

    support, it is unnecessary to output these

    subgraphs (nonclosed graphs)

    Lossless compression: still ensures that the

    mining result is complete

  • 7/28/2019 11 Graph Pattern Mining

    26/71

    26

    CLOSEGRAPH (Yan & Han, KDD03)

    A Pattern-Growth Approach

    G

    G1

    G2

    Gn

    k-edge

    (k+1)-edge

    At what condition, can westop searching their children

    i.e., early termination?

    If G and G are frequent, G is asubgraph of G. Ifin any part

    of the graph in the datasetwhere G occurs, G also

    occurs, then we need not growG, since none of Gs children willbe closed except those of G.

  • 7/28/2019 11 Graph Pattern Mining

    27/71

    27

    Handling Tricky Exception Cases

    (graph 1)

    a

    c

    b

    d

    (pattern 2)

    (pattern 1)

    (graph 2)

    a

    c

    b

    d

    a b

    a

    c d

  • 7/28/2019 11 Graph Pattern Mining

    28/71

    28

    Experimental Result

    The AIDS antiviral screen compound dataset

    from NCI/NIH

    The dataset contains 43,905 chemical

    compounds Among these 43,905 compounds, 423 of them

    belongs to CA, 1081 are of CM, and the

    remaining are in class CI

  • 7/28/2019 11 Graph Pattern Mining

    29/71

    29

    Discovered Patterns

    20% 10%

    5%

  • 7/28/2019 11 Graph Pattern Mining

    30/71

    30

    Performance (1): Run Time

    Minimum support (in %)

    Run

    timeper

    pattern

    (msec

    )

  • 7/28/2019 11 Graph Pattern Mining

    31/71

    31

    Performance (2): Memory Usage

    Minimum support (in %)

    Me

    moryusa

    ge(GB)

  • 7/28/2019 11 Graph Pattern Mining

    32/71

    32

    Number of Patterns: Frequent vs. Closed

    CA

    1.0E+02

    1.0E+03

    1.0E+04

    1.0E+05

    1.0E+06

    0.05 0.06 0.07 0.08 0.1

    frequent graphs

    closed frequent graphs

    Minimum support

    Num

    berofpatterns

  • 7/28/2019 11 Graph Pattern Mining

    33/71

    33

    Runtime: Frequent vs. Closed

    CA

    1

    10

    100

    1000

    10000

    0.05 0.06 0.07 0.08 0.1

    FSG

    Gspan

    CloseGraph

    Minimum support

    R

    untime

    (sec)

  • 7/28/2019 11 Graph Pattern Mining

    34/71

    34

    Do the Odds Beat the Curse of Complexity?

    Potentially exponential number of frequent patterns

    The worst case complexty vs. the expected probability

    Ex.: Suppose Walmart has 104 kinds of products

    The chance to pick up one product 10-4

    The chance to pick up a particular set of 10 products: 10-40

    What is the chance this particular set of 10 products to be

    frequent 103 times in 109 transactions?

    Have we solved the NP-hard problem of subgraph isomorphism

    testing?

    No. But the real graphs in bio/chemistry is not so bad

    A carbon has only 4 bounds and most proteins in a network

    have distinct labels

  • 7/28/2019 11 Graph Pattern Mining

    35/71

    35

    Graph Mining

    Graph Pattern Mining

    Mining Frequent Subgraph Patterns

    Impact on Graph Search I: Graph Indexing

    Impact on Graph Search II: Graph SimilaritySearch

    Constrained Graph Pattern Mining

    Graph Classification

    Graph Clustering

    Summary

  • 7/28/2019 11 Graph Pattern Mining

    36/71

    36

    Graph Search

    Querying graph databases: Given a graph database and a query graph,

    find all the graphs containing this query graph

    query graph graph database

  • 7/28/2019 11 Graph Pattern Mining

    37/71

    37

    Scalability Issue

    Sequential scan Disk I/Os

    Subgraph isomorphism testing

    An indexing mechanism is needed

    DayLight: Daylight.com (commercial)

    GraphGrep: Dennis Shasha, et al. PODS'02

    Grace: Srinath Srinivasa, et al. ICDE'03

  • 7/28/2019 11 Graph Pattern Mining

    38/71

    38

    Indexing Strategy

    Graph (G)

    Substructure

    Query graph (Q)

    If graph G contains query

    graph Q, G should contain

    any substructure of Q

    Remarks

    Index substructures of a query graph to prunegraphs that do not contain these substructures

  • 7/28/2019 11 Graph Pattern Mining

    39/71

    39

    Indexing Framework

    Two steps in processing graph queries

    Step 1. Index Construction

    Enumerate structures in the graph database,

    build an inverted index between structures

    and graphs

    Step 2. Query Processing

    Enumerate structures in the query graph

    Calculate the candidate graphs containing

    these structures

    Prune the false positive answers by

    performing subgraph isomorphism test

  • 7/28/2019 11 Graph Pattern Mining

    40/71

    40

    Cost Analysis

    QUERY RESPONSE TIME

    testingmisomorphisioqindex TTCT _

    REMARK: make |Cq| as small as possible

    fetch index number of candidates

  • 7/28/2019 11 Graph Pattern Mining

    41/71

    41

    Path-based Approach

    GRAPH DATABASE

    PATHS

    0-length: C, O, N, S

    1-length: C-C, C-O, C-N, C-S, N-N, S-O

    2-length: C-C-C, C-O-C, C-N-C, ...

    3-length: ...

    (a) (b) (c)

    Built an inverted index between paths and graphs

  • 7/28/2019 11 Graph Pattern Mining

    42/71

  • 7/28/2019 11 Graph Pattern Mining

    43/71

    43

    Problems: Path-based Approach

    GRAPH DATABASE

    (a) (b) (c)QUERY GRAPH

    Only graph (c) contains this query

    graph. However, if we only indexpaths: C, C-C, C-C-C, C-C-C-C, we

    cannot prune graph (a) and (b).

    G

  • 7/28/2019 11 Graph Pattern Mining

    44/71

    44

    gIndex: Indexing Graphs by Data Mining

    Our methodology on graph index:

    Identify frequent structures in the database, the

    frequent structures are subgraphs that appear

    quite often in the graph database

    Prune redundant frequent structures to

    maintain a small set ofdiscriminative structures

    Create an inverted index betweendiscriminative frequent structures and graphs in

    the database

  • 7/28/2019 11 Graph Pattern Mining

    45/71

    45

    IDEAS: Indexing with Two Constraints

    structure (>106)

    frequent (~105)

    discriminative (~103)

    Wh Di i i ti S b h ?

  • 7/28/2019 11 Graph Pattern Mining

    46/71

    46

    Why Discriminative Subgraphs?

    All graphs contain structures: C, C-C, C-C-C

    Why bother indexing these redundant frequent

    structures? Only index structures that provide more

    information than existing structures

    Sample database

    (a) (b) (c)

    Di i i ti St t

  • 7/28/2019 11 Graph Pattern Mining

    47/71

    47

    Discriminative Structures

    Pinpoint the most useful frequent structures

    Given a set of structures and a new

    structure , we measure the extra indexing

    power provided by ,

    When is small enough, is a discriminative

    structure and should be included in the index

    Index discriminative frequent structures only

    Reduce the index size by an order of

    magnitude

    .,,, 21 xffffxP in

    xnfff ,, 21

    x

    xP

    Wh F t St t ?

  • 7/28/2019 11 Graph Pattern Mining

    48/71

    48

    Why Frequent Structures?

    We cannot index (or even search) all ofsubstructures

    Large structures will likely be indexed well by theirsubstructures

    Size-increasing support threshold

    size

    su

    pport

    minimumsupport threshold

    E i t l S tti

  • 7/28/2019 11 Graph Pattern Mining

    49/71

    49

    Experimental Setting

    The AIDS antiviral screen compound dataset from

    NCI/NIH, containing 43,905 chemical compounds

    Query graphs are randomly extracted from the

    dataset

    GraphGrep: maximum length (edges) of paths is

    set at 10

    gIndex: maximum size (edges) of structures is set

    at 10

  • 7/28/2019 11 Graph Pattern Mining

    50/71

  • 7/28/2019 11 Graph Pattern Mining

    51/71

    51

    Experiments: Answer Set Size

    0

    20

    4060

    80

    100

    120140

    4 8 12 16 20 24

    GraphGrep

    gIndex

    Actual Match

    QUERY SIZE

    #OFCA

    NDIDATES

  • 7/28/2019 11 Graph Pattern Mining

    52/71

    Experiments: Incremental Maintenance

    20

    30

    40

    50

    60

    70

    80

    2K 4K 6k 8k 10k

    From scratch Incremental

    Frequent structures are stable to database updating

    Index can be built based on a small portion of a graph

    database, but be used for the whole database

    Alt ti G h I d i M th d

  • 7/28/2019 11 Graph Pattern Mining

    53/71

    Alternative Graph Indexing Methods

    Graph-structure-based indexing and similarity search

    Structure-based index methods, e.g., g-Index, S-path index Use index to search for similar graph/network structures

    Substructure indexing

    Key problem: What substructures as indexing features?

    gIndex [Yan, Yu & Han, SIGMOD04]: Findfrequent anddiscriminative subgraphs (by graph-pattern mining)

    S-path [Zhao & Han, VLDB10]: Use decomposed shortestpaths as basic indexing features

    53

    Why S Path as Indexing Features?

  • 7/28/2019 11 Graph Pattern Mining

    54/71

    Why S-Path as Indexing Features?

    Neighborhood signatures of vertices are built to maintain

    indexing features: Effective search space pruning ability Processing (Query Decomposition): Decompose the query

    graph into a set of indexed shortest paths in S-Path

    Network

    A global lookup table Neighborhood signature of v3

    Query

    G h Mi i

  • 7/28/2019 11 Graph Pattern Mining

    55/71

    55

    Graph Mining

    Graph Pattern Mining

    Mining Frequent Subgraph Patterns

    Impact on Graph Search I: Graph Indexing

    Impact on Graph Search II: Graph SimilaritySearch

    Constrained Graph Pattern Mining

    Graph Classification Graph Clustering

    Summary

    St t Si il it S h

  • 7/28/2019 11 Graph Pattern Mining

    56/71

    56

    Structure Similarity Search

    (a) caffeine (b) diurobromine (c) viagra

    CHEMICAL COMPOUNDS

    QUERY GRAPH

    S St i htf d M th d

  • 7/28/2019 11 Graph Pattern Mining

    57/71

    57

    Some Straightforward Methods

    Method1: Directly compute the similarity between the

    graphs in the DB and the query graph

    Sequential scan

    Subgraph similarity computation

    Method 2: Form a set of subgraph queries from the

    original query graph and use the exact subgraph

    search

    Costly: If we allow 3 edges to be missed in a 20-edge

    query graph, it may generate 1,140 subgraphs

    I d P i A i t S h

  • 7/28/2019 11 Graph Pattern Mining

    58/71

    58

    Index: Precise vs. Approximate Search

    Precise Search

    Use frequent patterns as indexing features

    Select features in the database space based on their

    selectivity

    Build the index Approximate Search

    Hard to build indices covering similar subgraphs

    explosive number of subgraphs in databases

    Idea: (1) keep the index structure

    (2) select features in the query space

    S bstr ct re Similarit Meas re

  • 7/28/2019 11 Graph Pattern Mining

    59/71

    59

    Substructure Similarity Measure

    Query relaxation measure

    The number of edges that can be relabeled or missed;but the position of these edges are not fixed

    QUERY GRAPH

    Substructure Similarity Measure

  • 7/28/2019 11 Graph Pattern Mining

    60/71

    60

    Substructure Similarity Measure

    Feature-based similarity measure

    Each graph is represented as a feature vector

    X = {x1, x2, , xn}

    Similarity is defined by the distance of their

    corresponding vectors

    Advantages

    Easy to index

    Fast

    Rough measure

    Intuition: Feature Based Similarity Search

  • 7/28/2019 11 Graph Pattern Mining

    61/71

    61

    Intuition: Feature-Based Similarity Search

    Graph (G1)

    Substructure

    Query (Q)

    If graph G containsthe major part of a query

    graph Q, G should share

    a number of common

    features with Q

    Given a relaxation ratio,

    calculate the maximal

    number of features thatcan be missed !

    At least one of them

    should be contained

    Graph (G2)

    Feature-Graph Matrix

  • 7/28/2019 11 Graph Pattern Mining

    62/71

    62

    Feature-Graph Matrix

    G1 G2 G3 G4 G5

    f1 0 1 0 1 1

    f2 0 1 0 0 1

    f3 1 0 1 1 1f4 1 0 0 0 1

    f5 0 0 1 1 0

    Assume a query graph has 5 features and at most

    2 features to miss due to the relaxation threshold

    graphs in database

    features

    Edge Relaxation Feature Misses

  • 7/28/2019 11 Graph Pattern Mining

    63/71

    63

    Edge RelaxationFeature Misses

    If we allow k edges to be relaxed, J is the

    maximum number of features to be hit by k

    edgesit becomes the maximum coverage

    problem

    NP-complete

    A greedy algorithm exists

    We design a heuristic to refine the bound of

    feature misses

    Jk

    J

    k

    111greedy

    Query Processing Framework

  • 7/28/2019 11 Graph Pattern Mining

    64/71

    64

    Query Processing Framework

    Three steps in processing approximate graphqueries

    Step 1. Index Construction Select small structures as features in a

    graph database, and build the feature-

    graph matrix between the features and

    the graphs in the database

    Framework (cont )

  • 7/28/2019 11 Graph Pattern Mining

    65/71

    65

    Framework (cont.)

    Step 2. Feature Miss Estimation

    Determine the indexed features

    belonging to the query graph

    Calculate the upper bound of the number

    of features that can be missed for an

    approximate matching, denoted by J

    On the query graph, not the graphdatabase

    Framework (cont )

  • 7/28/2019 11 Graph Pattern Mining

    66/71

    66

    Framework (cont.)

    Step 3. Query Processing

    Use the feature-graph matrix to

    calculate the difference in the numberof features between graph G and query

    Q, FG FQ

    If FG FQ > J, discard G. The remaininggraphs constitute a candidate answer

    set

    Performance Study

  • 7/28/2019 11 Graph Pattern Mining

    67/71

    67

    Performance Study

    Database

    Chemical compounds of Anti-Aids Drug fromNCI/NIH, randomly select 10,000 compounds

    Query

    Randomly select 30 graphs with 16 and 20edges as query graphs

    Competitive algorithms

    Grafil: Graph Filterour algorithm Edge: use edges only

    All: use all the features

    Comparison of the Three Algorithms

  • 7/28/2019 11 Graph Pattern Mining

    68/71

    68

    Comparison of the Three Algorithms

    edge relaxation

    10

    100

    1000

    10000

    1 2 3 4

    Grafil

    Edge

    All

    #ofcandidates

    Summary: Graph Pattern Mining

  • 7/28/2019 11 Graph Pattern Mining

    69/71

    Summary: Graph Pattern Mining

    Graph mining has wide applications

    Frequent and closed subgraph mining methods

    gSpan and CloseGraph: pattern-growth depth-first

    search approach

    Graph indexing techniques

    Frequent and discriminative subgraphs are high-quality

    indexing features

    Similarity search in graph databases

    Indexing and feature-based matching

    Constraint-based graph pattern mining

    References (1)

  • 7/28/2019 11 Graph Pattern Mining

    70/71

    References (1)

    T. Asai, et al. Efficient substructure discovery from large semi-structured data, SDM'02

    C. Borgelt and M. R. Berthold, Mining molecular fragments: Finding relevant substructures of

    molecules, ICDM'02 M. Deshpande, M. Kuramochi, and G. Karypis, Frequent Sub-structure Based Approaches for Classifying

    Chemical Compounds, ICDM 2003

    M. Deshpande, M. Kuramochi, and G. Karypis. Automated approaches for classifying structures,

    BIOKDD'02

    L. Dehaspe, H. Toivonen, and R. King. Finding frequent substructures in chemical compounds, KDD'98

    C. Faloutsos, K. McCurley, and A. Tomkins, Fast Discovery of 'Connection Subgraphs, KDD'04

    L. Holder, D. Cook, and S. Djoko. Substructure discovery in the subdue system, KDD'94 J. Huan, W. Wang, D. Bandyopadhyay, J. Snoeyink, J. Prins, and A. Tropsha. Mining spatial motifs from

    protein structure graphs, RECOMB04

    J. Huan, W. Wang, and J. Prins. Efficient mining of frequent subgraph in the presence of isomorphism,

    ICDM'03

    H. Hu, X. Yan, Yu, J. Han and X. J. Zhou, Mining Coherent Dense Subgraphs across Massive BiologicalNetworks for Functional Discovery, ISMB'05

    A. Inokuchi, T. Washio, and H. Motoda. An apriori-based algorithm for mining frequent substructuresfrom graph data, PKDD'00

    C. James, D. Weininger, and J. Delany. Daylight Theory Manual Daylight Version 4.82. Daylight

    Chemical Information Systems, Inc., 2003.

    G. Jeh, and J. Widom, Mining the Space of Graph Properties, KDD'04

    M. Koyuturk, A. Grama, and W. Szpankowski. An efficient algorithm for detecting frequent subgraphs inbiological networks, Bioinformatics, 20:I200--I207, 2004.

    References (2)

  • 7/28/2019 11 Graph Pattern Mining

    71/71

    References (2)

    M. Kuramochi and G. Karypis. Frequent subgraph discovery, ICDM'01

    M. Kuramochi and G. Karypis, GREW: A Scalable Frequent Subgraph Discovery Algorithm, ICDM04

    B. McKay. Practical graph isomorphism. Congressus Numerantium, 30:45--87, 1981. S. Nijssen and J. Kok. A quickstart in frequent structure mining can make a difference. KDD'04

    J. Prins, J. Yang, J. Huan, and W. Wang. Spin: Mining maximal frequent subgraphs from graph

    databases. KDD'04

    D. Shasha, J. T.-L. Wang, and R. Giugno. Algorithmics and applications of tree and graph searching,PODS'02

    J. R. Ullmann. An algorithm for subgraph isomorphism, J. ACM, 23:31--42, 1976.

    N. Vanetik, E. Gudes, and S. E. Shimony. Computing frequent graph patterns from semistructured data,ICDM'02

    C. Wang, W. Wang, J. Pei, Y. Zhu, and B. Shi. Scalable mining of large disk-base graph databases, KDD'04

    T. Washio and H. Motoda, State of the art of graph-based data mining, SIGKDD Explorations, 5:59-68,2003

    X. Yan and J. Han, gSpan: Graph-Based Substructure Pattern Mining, ICDM'02

    X. Yan and J. Han, CloseGraph: Mining Closed Frequent Graph Patterns, KDD'03

    X. Yan, P. S. Yu, and J. Han, Graph Indexing: A Frequent Structure-based Approach, SIGMOD'04

    X. Yan, X. J. Zhou, and J. Han, Mining Closed Relational Graphs with Connectivity Constraints, KDD'05

    X. Yan, P. S. Yu, and J. Han, Substructure Similarity Search in Graph Databases, SIGMOD'05

    X. Yan, F. Zhu, J. Han, and P. S. Yu, Searching Substructures with Superimposed Distance, ICDE'06

    M. J. Zaki. Efficiently mining frequent trees in a forest, KDD'02

    P Zh d J H O G h Q O i i i i L N k " VLDB'10