72
Mining Frequent Closed Graphs on Evolving Data Streams A. Bifet, G. Holmes, B. Pfahringer and R. Gavald` a University of Waikato Hamilton, New Zealand Laboratory for Relational Algorithmics, Complexity and Learning LARCA UPC-Barcelona Tech, Catalonia San Diego, 24 August 2011 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 2011

Mining Frequent Closed Graphs on Evolving Data Streams

Embed Size (px)

DESCRIPTION

Graph mining is a challenging task by itself, and even more so when processing data streams which evolve in real-time. Data stream mining faces hard constraints regarding time and space for processing, and also needs to provide for concept drift detection. In this talk we present a framework for studying graph pattern mining on time-varying streams and large datasets.

Citation preview

Page 1: Mining Frequent Closed Graphs on Evolving Data Streams

Mining Frequent Closed Graphs on Evolving Data Streams

A. Bifet, G. Holmes, B. Pfahringer and R. Gavalda

University of WaikatoHamilton, New Zealand

Laboratory for Relational Algorithmics, Complexity and Learning LARCAUPC-Barcelona Tech, Catalonia

San Diego, 24 August 201117th ACM SIGKDD International Conference on

Knowledge Discovery and Data Mining 2011

Page 2: Mining Frequent Closed Graphs on Evolving Data Streams

Mining Evolving Graph Data StreamsProblemGiven a data stream D of graphs, find frequent closed graphs.

Transaction Id Graph

1

C C S N

O

O

2

C C S N

O

C

3 C C S N

N

We provide three algorithms,of increasing power

IncrementalSliding WindowAdaptive

2 / 48

Page 3: Mining Frequent Closed Graphs on Evolving Data Streams

Non-incrementalFrequent Closed Graph Mining

CloseGraph: Xifeng Yan, Jiawei Hanright-most extension based on depth-first searchbased on gSpan ICDM’02

MoSS: Christian Borgelt, Michael R. Bertholdbreadth-first searchbased on MoFa ICDM’02

3 / 48

Page 4: Mining Frequent Closed Graphs on Evolving Data Streams

Outline

1 Streaming Data

2 Frequent Pattern MiningMining Evolving Graph StreamsADAGRAPHMINER

3 Experimental Evaluation

4 Summary and Future Work

4 / 48

Page 5: Mining Frequent Closed Graphs on Evolving Data Streams

Outline

1 Streaming Data

2 Frequent Pattern MiningMining Evolving Graph StreamsADAGRAPHMINER

3 Experimental Evaluation

4 Summary and Future Work

5 / 48

Page 6: Mining Frequent Closed Graphs on Evolving Data Streams

Mining Massive Data

Source: IDC’s Digital Universe Study (EMC), June 2011

6 / 48

Page 7: Mining Frequent Closed Graphs on Evolving Data Streams

Data Streams

Data StreamsSequence is potentially infiniteHigh amount of dataHigh speed of arrivalOnce an element from a data stream has been processedit is discarded or archived

Tools:approximationrandomization, samplingsketching

7 / 48

Page 8: Mining Frequent Closed Graphs on Evolving Data Streams

Data Streams Approximation Algorithms

1011000111 1010101

Sliding WindowWe can maintain simple statistics over sliding windows, usingO(1

εlog2 N) space, whereN is the length of the sliding windowε is the accuracy parameter

M. Datar, A. Gionis, P. Indyk, and R. Motwani.Maintaining stream statistics over sliding windows. 2002

8 / 48

Page 9: Mining Frequent Closed Graphs on Evolving Data Streams

Data Streams Approximation Algorithms

10110001111 0101011

Sliding WindowWe can maintain simple statistics over sliding windows, usingO(1

εlog2 N) space, whereN is the length of the sliding windowε is the accuracy parameter

M. Datar, A. Gionis, P. Indyk, and R. Motwani.Maintaining stream statistics over sliding windows. 2002

8 / 48

Page 10: Mining Frequent Closed Graphs on Evolving Data Streams

Data Streams Approximation Algorithms

101100011110 1010111

Sliding WindowWe can maintain simple statistics over sliding windows, usingO(1

εlog2 N) space, whereN is the length of the sliding windowε is the accuracy parameter

M. Datar, A. Gionis, P. Indyk, and R. Motwani.Maintaining stream statistics over sliding windows. 2002

8 / 48

Page 11: Mining Frequent Closed Graphs on Evolving Data Streams

Data Streams Approximation Algorithms

1011000111101 0101110

Sliding WindowWe can maintain simple statistics over sliding windows, usingO(1

εlog2 N) space, whereN is the length of the sliding windowε is the accuracy parameter

M. Datar, A. Gionis, P. Indyk, and R. Motwani.Maintaining stream statistics over sliding windows. 2002

8 / 48

Page 12: Mining Frequent Closed Graphs on Evolving Data Streams

Data Streams Approximation Algorithms

10110001111010 1011101

Sliding WindowWe can maintain simple statistics over sliding windows, usingO(1

εlog2 N) space, whereN is the length of the sliding windowε is the accuracy parameter

M. Datar, A. Gionis, P. Indyk, and R. Motwani.Maintaining stream statistics over sliding windows. 2002

8 / 48

Page 13: Mining Frequent Closed Graphs on Evolving Data Streams

Data Streams Approximation Algorithms

101100011110101 0111010

Sliding WindowWe can maintain simple statistics over sliding windows, usingO(1

εlog2 N) space, whereN is the length of the sliding windowε is the accuracy parameter

M. Datar, A. Gionis, P. Indyk, and R. Motwani.Maintaining stream statistics over sliding windows. 2002

8 / 48

Page 14: Mining Frequent Closed Graphs on Evolving Data Streams

Outline

1 Streaming Data

2 Frequent Pattern MiningMining Evolving Graph StreamsADAGRAPHMINER

3 Experimental Evaluation

4 Summary and Future Work

9 / 48

Page 15: Mining Frequent Closed Graphs on Evolving Data Streams

Pattern Mining

Dataset ExampleDocument Patterns

d1 abced2 cded3 abced4 acded5 abcded6 bcd

10 / 48

Page 16: Mining Frequent Closed Graphs on Evolving Data Streams

Itemset Mining

d1 abced2 cded3 abced4 acded5 abcded6 bcd

Support Frequentd1,d2,d3,d4,d5,d6 c

d1,d2,d3,d4,d5 e,ced1,d3,d4,d5 a,ac,ae,aced1,d3,d5,d6 b,bc

d2,d4,d5 d,cdd1,d3,d5 ab,abc,abe

be,bce,abced2,d4,d5 de,cde

11 / 48

Page 17: Mining Frequent Closed Graphs on Evolving Data Streams

Itemset Mining

d1 abced2 cded3 abced4 acded5 abcded6 bcd

Support Frequent6 c5 e,ce4 a,ac,ae,ace4 b,bc4 d,cd3 ab,abc,abe

be,bce,abce3 de,cde

12 / 48

Page 18: Mining Frequent Closed Graphs on Evolving Data Streams

Itemset Mining

d1 abced2 cded3 abced4 acded5 abcded6 bcd

Support Frequent Gen Closed6 c c c5 e,ce e ce4 a,ac,ae,ace a ace4 b,bc b bc4 d,cd d cd3 ab,abc,abe ab

be,bce,abce be abce3 de,cde de cde

12 / 48

Page 19: Mining Frequent Closed Graphs on Evolving Data Streams

Itemset Mining

d1 abced2 cded3 abced4 acded5 abcded6 bcd

Support Frequent Gen Closed Max6 c c c5 e,ce e ce4 a,ac,ae,ace a ace4 b,bc b bc4 d,cd d cd3 ab,abc,abe ab

be,bce,abce be abce abce3 de,cde de cde cde

12 / 48

Page 20: Mining Frequent Closed Graphs on Evolving Data Streams

Outline

1 Streaming Data

2 Frequent Pattern MiningMining Evolving Graph StreamsADAGRAPHMINER

3 Experimental Evaluation

4 Summary and Future Work

13 / 48

Page 21: Mining Frequent Closed Graphs on Evolving Data Streams

Graph DatasetTransaction Id Graph Weight

1

C C S N

O

O 1

2

C C S N

O

C 1

3

C S N

O

C 1

4 C C S N

N

1

14 / 48

Page 22: Mining Frequent Closed Graphs on Evolving Data Streams

Graph Coresets

Coreset of a set P with respect to some problemSmall subset that approximates the original set P.

Solving the problem for the coreset provides anapproximate solution for the problem on P.

δ -tolerance Closed GraphA graph g is δ -tolerance closed if none of its proper frequentsupergraphs has a weighted support ≥ (1−δ ) ·support(g).

Maximal graph: 1-tolerance closed graphClosed graph: 0-tolerance closed graph.

15 / 48

Page 23: Mining Frequent Closed Graphs on Evolving Data Streams

Graph Coresets

Coreset of a set P with respect to some problemSmall subset that approximates the original set P.

Solving the problem for the coreset provides anapproximate solution for the problem on P.

δ -tolerance Closed GraphA graph g is δ -tolerance closed if none of its proper frequentsupergraphs has a weighted support ≥ (1−δ ) ·support(g).

Maximal graph: 1-tolerance closed graphClosed graph: 0-tolerance closed graph.

15 / 48

Page 24: Mining Frequent Closed Graphs on Evolving Data Streams

Graph Coresets

Relative support of a closed graphSupport of a graph minus the relative support of its closedsupergraphs.

The sum of the closed supergraphs’ relative supports of agraph and its relative support is equal to its own support.

(s,δ )-coreset for the problem of computing closedgraphsWeighted multiset of frequent δ -tolerance closed graphs withminimum support s using their relative support as a weight.

16 / 48

Page 25: Mining Frequent Closed Graphs on Evolving Data Streams

Graph Coresets

Relative support of a closed graphSupport of a graph minus the relative support of its closedsupergraphs.

The sum of the closed supergraphs’ relative supports of agraph and its relative support is equal to its own support.

(s,δ )-coreset for the problem of computing closedgraphsWeighted multiset of frequent δ -tolerance closed graphs withminimum support s using their relative support as a weight.

16 / 48

Page 26: Mining Frequent Closed Graphs on Evolving Data Streams

Graph DatasetTransaction Id Graph Weight

1

C C S N

O

O 1

2

C C S N

O

C 1

3

C S N

O

C 1

4 C C S N

N

1

17 / 48

Page 27: Mining Frequent Closed Graphs on Evolving Data Streams

Graph Coresets

Graph Relative Support SupportC C S N 3 3

C S N

O

3 3

C S

N

3 3

Table: Example of a coreset with minimum support 50% and δ = 1

18 / 48

Page 28: Mining Frequent Closed Graphs on Evolving Data Streams

Graph Coresets

Figure: Number of graphs in a (40%,δ )-coreset for NCI.

19 / 48

Page 29: Mining Frequent Closed Graphs on Evolving Data Streams

Outline

1 Streaming Data

2 Frequent Pattern MiningMining Evolving Graph StreamsADAGRAPHMINER

3 Experimental Evaluation

4 Summary and Future Work

20 / 48

Page 30: Mining Frequent Closed Graphs on Evolving Data Streams

INCGRAPHMINER

INCGRAPHMINER(D,min sup)

Input: A graph dataset D, and min sup.Output: The frequent graph set G.

1 G← /02 for every batch bt of graphs in D3 do C← CORESET(bt ,min sup)4 G← CORESET(G∪C,min sup)5 return G

21 / 48

Page 31: Mining Frequent Closed Graphs on Evolving Data Streams

WINGRAPHMINER

WINGRAPHMINER(D,W ,min sup)

Input: A graph dataset D, a size window W and min sup.Output: The frequent graph set G.

1 G← /02 for every batch bt of graphs in D3 do C← CORESET(bt ,min sup)4 Store C in sliding window5 if sliding window is full6 then R← Oldest C stored in sliding window,

negate all support values7 else R← /08 G← CORESET(G∪C∪R,min sup)9 return G

22 / 48

Page 32: Mining Frequent Closed Graphs on Evolving Data Streams

Algorithm ADaptive Sliding WINdowExample

W= 101010110111111W0= 1

ADWIN: ADAPTIVE WINDOWING ALGORITHM

1 Initialize Window W2 for each t > 03 do W ←W ∪{xt} (i.e., add xt to the head of W )4 repeat Drop elements from the tail of W5 until |µW0− µW1 | ≥ εc holds6 for every split of W into W = W0 ·W17 Output µW

23 / 48

Page 33: Mining Frequent Closed Graphs on Evolving Data Streams

Algorithm ADaptive Sliding WINdowExample

W= 101010110111111W0= 1 W1 = 01010110111111

ADWIN: ADAPTIVE WINDOWING ALGORITHM

1 Initialize Window W2 for each t > 03 do W ←W ∪{xt} (i.e., add xt to the head of W )4 repeat Drop elements from the tail of W5 until |µW0− µW1 | ≥ εc holds6 for every split of W into W = W0 ·W17 Output µW

23 / 48

Page 34: Mining Frequent Closed Graphs on Evolving Data Streams

Algorithm ADaptive Sliding WINdowExample

W= 101010110111111W0= 10 W1 = 1010110111111

ADWIN: ADAPTIVE WINDOWING ALGORITHM

1 Initialize Window W2 for each t > 03 do W ←W ∪{xt} (i.e., add xt to the head of W )4 repeat Drop elements from the tail of W5 until |µW0− µW1 | ≥ εc holds6 for every split of W into W = W0 ·W17 Output µW

23 / 48

Page 35: Mining Frequent Closed Graphs on Evolving Data Streams

Algorithm ADaptive Sliding WINdowExample

W= 101010110111111W0= 101 W1 = 010110111111

ADWIN: ADAPTIVE WINDOWING ALGORITHM

1 Initialize Window W2 for each t > 03 do W ←W ∪{xt} (i.e., add xt to the head of W )4 repeat Drop elements from the tail of W5 until |µW0− µW1 | ≥ εc holds6 for every split of W into W = W0 ·W17 Output µW

23 / 48

Page 36: Mining Frequent Closed Graphs on Evolving Data Streams

Algorithm ADaptive Sliding WINdowExample

W= 101010110111111W0= 1010 W1 = 10110111111

ADWIN: ADAPTIVE WINDOWING ALGORITHM

1 Initialize Window W2 for each t > 03 do W ←W ∪{xt} (i.e., add xt to the head of W )4 repeat Drop elements from the tail of W5 until |µW0− µW1 | ≥ εc holds6 for every split of W into W = W0 ·W17 Output µW

23 / 48

Page 37: Mining Frequent Closed Graphs on Evolving Data Streams

Algorithm ADaptive Sliding WINdowExample

W= 101010110111111W0= 10101 W1 = 0110111111

ADWIN: ADAPTIVE WINDOWING ALGORITHM

1 Initialize Window W2 for each t > 03 do W ←W ∪{xt} (i.e., add xt to the head of W )4 repeat Drop elements from the tail of W5 until |µW0− µW1 | ≥ εc holds6 for every split of W into W = W0 ·W17 Output µW

23 / 48

Page 38: Mining Frequent Closed Graphs on Evolving Data Streams

Algorithm ADaptive Sliding WINdowExample

W= 101010110111111W0= 101010 W1 = 110111111

ADWIN: ADAPTIVE WINDOWING ALGORITHM

1 Initialize Window W2 for each t > 03 do W ←W ∪{xt} (i.e., add xt to the head of W )4 repeat Drop elements from the tail of W5 until |µW0− µW1 | ≥ εc holds6 for every split of W into W = W0 ·W17 Output µW

23 / 48

Page 39: Mining Frequent Closed Graphs on Evolving Data Streams

Algorithm ADaptive Sliding WINdowExample

W= 101010110111111W0= 1010101 W1 = 10111111

ADWIN: ADAPTIVE WINDOWING ALGORITHM

1 Initialize Window W2 for each t > 03 do W ←W ∪{xt} (i.e., add xt to the head of W )4 repeat Drop elements from the tail of W5 until |µW0− µW1 | ≥ εc holds6 for every split of W into W = W0 ·W17 Output µW

23 / 48

Page 40: Mining Frequent Closed Graphs on Evolving Data Streams

Algorithm ADaptive Sliding WINdowExample

W= 101010110111111W0= 10101011 W1 = 0111111

ADWIN: ADAPTIVE WINDOWING ALGORITHM

1 Initialize Window W2 for each t > 03 do W ←W ∪{xt} (i.e., add xt to the head of W )4 repeat Drop elements from the tail of W5 until |µW0− µW1 | ≥ εc holds6 for every split of W into W = W0 ·W17 Output µW

23 / 48

Page 41: Mining Frequent Closed Graphs on Evolving Data Streams

Algorithm ADaptive Sliding WINdowExample

W= 101010110111111 |µW0− µW1 | ≥ εc : CHANGE DET.!

W0= 101010110 W1 = 111111

ADWIN: ADAPTIVE WINDOWING ALGORITHM

1 Initialize Window W2 for each t > 03 do W ←W ∪{xt} (i.e., add xt to the head of W )4 repeat Drop elements from the tail of W5 until |µW0− µW1 | ≥ εc holds6 for every split of W into W = W0 ·W17 Output µW

23 / 48

Page 42: Mining Frequent Closed Graphs on Evolving Data Streams

Algorithm ADaptive Sliding WINdowExample

W= 101010110111111 Drop elements from the tail of WW0= 101010110 W1 = 111111

ADWIN: ADAPTIVE WINDOWING ALGORITHM

1 Initialize Window W2 for each t > 03 do W ←W ∪{xt} (i.e., add xt to the head of W )4 repeat Drop elements from the tail of W5 until |µW0− µW1 | ≥ εc holds6 for every split of W into W = W0 ·W17 Output µW

23 / 48

Page 43: Mining Frequent Closed Graphs on Evolving Data Streams

Algorithm ADaptive Sliding WINdowExample

W= 01010110111111 Drop elements from the tail of WW0= 101010110 W1 = 111111

ADWIN: ADAPTIVE WINDOWING ALGORITHM

1 Initialize Window W2 for each t > 03 do W ←W ∪{xt} (i.e., add xt to the head of W )4 repeat Drop elements from the tail of W5 until |µW0− µW1 | ≥ εc holds6 for every split of W into W = W0 ·W17 Output µW

23 / 48

Page 44: Mining Frequent Closed Graphs on Evolving Data Streams

Algorithm ADaptive Sliding WINdow

TheoremAt every time step we have:

1 (False positive rate bound). If µt remains constant withinW, the probability that ADWIN shrinks the window at thisstep is at most δ .

2 (False negative rate bound). Suppose that for somepartition of W in two parts W0W1 (where W1 contains themost recent items) we have |µW0−µW1 |> 2εc . Then withprobability 1−δ ADWIN shrinks W to W1, or shorter.

ADWIN tunes itself to the data stream at hand, with no need forthe user to hardwire or precompute parameters.

24 / 48

Page 45: Mining Frequent Closed Graphs on Evolving Data Streams

Algorithm ADaptive Sliding WINdow

ADWIN using a Data Stream Sliding Window Model,can provide the exact counts of 1’s in O(1) time per point.tries O(logW ) cutpointsuses O(1

εlogW ) memory words

the processing time per example is O(logW ) (amortizedand worst-case).

Sliding Window Model

1010101 101 11 1 1Content: 4 2 2 1 1Capacity: 7 3 2 1 1

25 / 48

Page 46: Mining Frequent Closed Graphs on Evolving Data Streams

ADAGRAPHMINER

ADAGRAPHMINER(D,Mode,min sup)

1 G← /02 Init ADWIN3 for every batch bt of graphs in D4 do C ← CORESET(bt ,min sup)5 R← /06 if Mode is Sliding Window7 then Store C in sliding window8 if ADWIN detected change9 then R← Batches to remove

in sliding windowwith negative support

10 G← CORESET(G∪C∪R,min sup)11 if Mode is Sliding Window12 then Insert # closed graphs into ADWIN13 else for every g in G update g’s ADWIN14 return G

26 / 48

Page 47: Mining Frequent Closed Graphs on Evolving Data Streams

ADAGRAPHMINER

ADAGRAPHMINER(D,Mode,min sup)

1 G← /02 Init ADWIN3 for every batch bt of graphs in D4 do C ← CORESET(bt ,min sup)5 R← /06789

10 G← CORESET(G∪C∪R,min sup)111213 for every g in G update g’s ADWIN14 return G

26 / 48

Page 48: Mining Frequent Closed Graphs on Evolving Data Streams

ADAGRAPHMINER

ADAGRAPHMINER(D,Mode,min sup)

1 G← /02 Init ADWIN3 for every batch bt of graphs in D4 do C ← CORESET(bt ,min sup)5 R← /06 if Mode is Sliding Window7 then Store C in sliding window8 if ADWIN detected change9 then R← Batches to remove

in sliding windowwith negative support

10 G← CORESET(G∪C∪R,min sup)11 if Mode is Sliding Window12 then Insert # closed graphs into ADWIN1314 return G

26 / 48

Page 49: Mining Frequent Closed Graphs on Evolving Data Streams

Outline

1 Streaming Data

2 Frequent Pattern MiningMining Evolving Graph StreamsADAGRAPHMINER

3 Experimental Evaluation

4 Summary and Future Work

27 / 48

Page 50: Mining Frequent Closed Graphs on Evolving Data Streams

Experimental Evaluation

ChemDB datasetPublic dataset4 million moleculesInstitute for Genomics and Bioinformatics at the Universityof California, Irvine

Open NCI DatabasePublic domain250,000 structuresNational Cancer Institute

28 / 48

Page 51: Mining Frequent Closed Graphs on Evolving Data Streams

What is MOA?{M}assive {O}nline {A}nalysis is a framework for onlinelearning from data streams.

It is closely related to WEKAIt includes a collection of offline and online as well as toolsfor evaluation:

classificationclustering

Easy to extendEasy to design and run experiments

29 / 48

Page 52: Mining Frequent Closed Graphs on Evolving Data Streams

WEKA: the bird

30 / 48

Page 53: Mining Frequent Closed Graphs on Evolving Data Streams

MOA: the bird

The Moa (another native NZ bird) is not only flightless, like theWeka, but also extinct.

31 / 48

Page 54: Mining Frequent Closed Graphs on Evolving Data Streams

MOA: the bird

The Moa (another native NZ bird) is not only flightless, like theWeka, but also extinct.

31 / 48

Page 55: Mining Frequent Closed Graphs on Evolving Data Streams

MOA: the bird

The Moa (another native NZ bird) is not only flightless, like theWeka, but also extinct.

31 / 48

Page 56: Mining Frequent Closed Graphs on Evolving Data Streams

Classification Experimental Setting

32 / 48

Page 57: Mining Frequent Closed Graphs on Evolving Data Streams

Classification Experimental Setting

ClassifiersNaive BayesDecision stumpsHoeffding TreeHoeffding Option TreeBagging and BoostingADWIN Bagging andLeveraging Bagging

Prediction strategiesMajority classNaive Bayes LeavesAdaptive Hybrid

33 / 48

Page 58: Mining Frequent Closed Graphs on Evolving Data Streams

Clustering Experimental Setting

34 / 48

Page 59: Mining Frequent Closed Graphs on Evolving Data Streams

Clustering Experimental Setting

ClusterersStreamKM++CluStreamClusTreeDen-StreamCobWeb

35 / 48

Page 60: Mining Frequent Closed Graphs on Evolving Data Streams

Clustering Experimental SettingInternal measures External measuresGamma Rand statisticC Index Jaccard coefficientPoint-Biserial Folkes and Mallow IndexLog Likelihood Hubert Γ statisticsDunn’s Index Minkowski scoreTau PurityTau A van Dongen criterionTau C V-measureSomer’s Gamma CompletenessRatio of Repetition HomogeneityModified Ratio of Repetition Variation of informationAdjusted Ratio of Clustering Mutual informationFagan’s Index Class-based entropyDeviation Index Cluster-based entropyZ-Score Index PrecisionD Index RecallSilhouette coefficient F-measure

Table: Internal and external clustering evaluation measures.36 / 48

Page 61: Mining Frequent Closed Graphs on Evolving Data Streams

Cluster Mapping Measure

Hardy Kremer, Philipp Kranen, Timm Jansen, ThomasSeidl, Albert Bifet, Geoff Holmes and Bernhard Pfahringer.An Effective Evaluation Measure for Clustering on EvolvingData StreamKDD’11

CMM: Cluster Mapping MeasureA novel evaluation measure for stream clustering on evolvingstreams

CMM(C ,C L ) = 1− ∑o∈F w(o) ·pen(o,C)

∑o∈F w(o) ·con(o,Cl(o))

37 / 48

Page 62: Mining Frequent Closed Graphs on Evolving Data Streams

Extensions of MOA

Multi-label ClassificationActive LearningRegressionClosed Frequent Graph MiningTwitter Sentiment Analysis

Challenges for bigger data streamsSampling and distributed systems (Map-Reduce, Hadoop, S4)

38 / 48

Page 63: Mining Frequent Closed Graphs on Evolving Data Streams

Open NCI dataset

Time NCI Dataset

0

20

40

60

80

100

120

10.0

00

30.0

00

50.0

00

70.0

00

90.0

00

110.

000

130.

000

150.

000

170.

000

190.

000

210.

000

230.

000

250.

000

Instances

Se

co

nd

s

IncGraphMiner IncGraphMiner-C MoSS closeGraph

39 / 48

Page 64: Mining Frequent Closed Graphs on Evolving Data Streams

Open NCI dataset

Memory NCI Dataset

0100020003000400050006000700080009000

10.0

00

30.0

00

50.0

00

70.0

00

90.0

00

110.

000

130.

000

150.

000

170.

000

190.

000

210.

000

230.

000

250.

000

Instances

Me

ga

by

tes

IncGraphMiner IncGraphMiner-C MoSS closeGraph

40 / 48

Page 65: Mining Frequent Closed Graphs on Evolving Data Streams

ChemDB dataset

Memory ChemDB Dataset

05000

100001500020000250003000035000400004500050000

10.0

00

240.

000

470.

000

700.

000

930.

000

1.16

0.00

0

1.39

0.00

0

1.62

0.00

0

1.85

0.00

0

2.08

0.00

0

2.31

0.00

0

2.54

0.00

0

2.77

0.00

0

3.00

0.00

0

3.23

0.00

0

3.46

0.00

0

3.69

0.00

0

3.92

0.00

0

Instances

Me

ga

by

tes

IncGraphMiner IncGraphMiner-C MoSS closeGraph

41 / 48

Page 66: Mining Frequent Closed Graphs on Evolving Data Streams

ChemDB dataset

Time ChemDB Dataset

0500

100015002000250030003500400045005000

10.0

00

240.

000

470.

000

700.

000

930.

000

1.16

0.00

0

1.39

0.00

0

1.62

0.00

0

1.85

0.00

0

2.08

0.00

0

2.31

0.00

0

2.54

0.00

0

2.77

0.00

0

3.00

0.00

0

3.23

0.00

0

3.46

0.00

0

3.69

0.00

0

3.92

0.00

0

Instances

Se

co

nd

s

IncGraphMiner IncGraphMiner-C MoSS closeGraph

42 / 48

Page 67: Mining Frequent Closed Graphs on Evolving Data Streams

ADAGRAPHMINER

0

10

20

30

40

50

60

10.0

00

60.0

00

110.

000

160.

000

210.

000

260.

000

310.

000

360.

000

410.

000

460.

000

510.

000

560.

000

610.

000

660.

000

710.

000

760.

000

810.

000

860.

000

910.

000

960.

000

Instances

Nu

mb

er

of

Clo

se

d G

rap

hs

ADAGRAPHMINER ADAGRAPHMINER-Window IncGraphMiner

43 / 48

Page 68: Mining Frequent Closed Graphs on Evolving Data Streams

Outline

1 Streaming Data

2 Frequent Pattern MiningMining Evolving Graph StreamsADAGRAPHMINER

3 Experimental Evaluation

4 Summary and Future Work

44 / 48

Page 69: Mining Frequent Closed Graphs on Evolving Data Streams

SummaryMining Evolving Graph Data StreamsGiven a data stream D of graphs, find frequent closed graphs.

Transaction Id Graph

1

C C S N

O

O

2

C C S N

O

C

3 C C S N

N

We provide three algorithms,of increasing power

IncrementalSliding WindowAdaptive

45 / 48

Page 70: Mining Frequent Closed Graphs on Evolving Data Streams

Summary{M}assive {O}nline {A}nalysis is a framework for onlinelearning from data streams.

http://moa.cs.waikato.ac.nz

It is closely related to WEKAIt includes a collection of offline and online as well as toolsfor evaluation:

classificationclusteringfrequent pattern mining

MOA deals with evolving data streamsMOA is easy to use and extend

46 / 48

Page 71: Mining Frequent Closed Graphs on Evolving Data Streams

Future work

Structured massive data miningsampling, sketchingparallel techniques

47 / 48

Page 72: Mining Frequent Closed Graphs on Evolving Data Streams

Thanks!

48 / 48