View
193
Download
3
Category
Tags:
Preview:
DESCRIPTION
Full paper: http://boole.diiga.univpm.it/paper/sac2012.pdf In many experimental domains, especially e-Science, workflow management systems are gaining increasing attention to design and execute in-silico experiments involving data analysis tools. As a by-product, a repository of workflows is generated, that formally describes experimental protocols and the way different tools are combined inside experiments. In this paper we describe the use of the SUBDUE graph clustering algorithm to discover sub-workflows from a repository. Since sub-workflows represent significant usage patterns of tools, the discovered knowledge can be exploited by scientists to learn by-example about design practices, or to retrieve and reuse workflows. Such a knowledge, ultimately, leverages the potential of scientific workflow repositories to become a knowledge-asset. A set of experiments is conducted on the myExperiment repository to assess the effectiveness of the approach.
Citation preview
Mining Usage Patterns from a Repositoryof Scientific Workflows
Claudia Diamantini, Domenico Potena, Emanuele Storti
DII, Universita Politecnica delle Marche, Ancona, Italy
SAC 2012, Riva del Garda, March 26-30
Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 1 / 21
Introduction
Process mining techniques allow for extracting knowledge from eventlogs, i.e. traces of running processes:
audit trails of a workflow management systemtransaction logs of an enterprise resource planning system
Main applications:process discovery: what is really happening?conformance check: are we doing what was agreed upon?process extension: how can we redesign the process?
Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 2 / 21
Introduction
Process mining techniques allow for extracting knowledge from eventlogs, i.e. traces of running processes:
audit trails of a workflow management systemtransaction logs of an enterprise resource planning system
Main applications:process discovery: what is really happening?conformance check: are we doing what was agreed upon?process extension: how can we redesign the process?
Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 2 / 21
Introduction
Process mining techniques allow for extracting knowledge from eventlogs, i.e. traces of running processes:
audit trails of a workflow management systemtransaction logs of an enterprise resource planning system
Main applications:process discovery: what is really happening?conformance check: are we doing what was agreed upon?process extension: how can we redesign the process?
Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 2 / 21
Motivation
Little work about how to extract knowledge from a set of models (i.e.:process schemas)
understanding of common patterns of usage (best/worstpractices)support in process designprocess integration (from multiple sources)process optimization/re-engineeringindexing of processes and their retrieval
Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 3 / 21
Methodology
Processes schemas have an inherent graph structure: manygraph-mining tools available
Approach1 representation of original processes into graphs2 usage of graph-mining techniques to extract SUBs
Which representation form?Which specific technique?Graph clustering techniques (sub-processes are clusters) with ahierarchical approach (various levels of abstractions, more/lessspecificity)
Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 4 / 21
Methodology
Processes schemas have an inherent graph structure: manygraph-mining tools available
Approach1 representation of original processes into graphs2 usage of graph-mining techniques to extract SUBs
Which representation form?Which specific technique?Graph clustering techniques (sub-processes are clusters) with ahierarchical approach (various levels of abstractions, more/lessspecificity)
Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 4 / 21
Methodology
Processes schemas have an inherent graph structure: manygraph-mining tools available
Approach1 representation of original processes into graphs2 usage of graph-mining techniques to extract SUBs
Which representation form?Which specific technique?Graph clustering techniques (sub-processes are clusters) with ahierarchical approach (various levels of abstractions, more/lessspecificity)
Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 4 / 21
Methodology
Processes schemas have an inherent graph structure: manygraph-mining tools available
Approach1 representation of original processes into graphs2 usage of graph-mining techniques to extract SUBs
Which representation form?Which specific technique?Graph clustering techniques (sub-processes are clusters) with ahierarchical approach (various levels of abstractions, more/lessspecificity)
Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 4 / 21
SUBDUE
SUBDUE [Joyner, 2000]graph-based hierarchical clustering algorithmsuited for discrete-valued and structured datasearches for substructures (i.e., subgraphs) that best compressthe input graph, according to MDL
Comments:to compress G with a SUB: to replace every occurrence of SUB inG with a single new nodeSUB best compresses G: num. of bits needed to represent G afterthe compression is minimum
Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 5 / 21
SUBDUE
SUBDUE [Joyner, 2000]graph-based hierarchical clustering algorithmsuited for discrete-valued and structured datasearches for substructures (i.e., subgraphs) that best compressthe input graph, according to MDL
Comments:to compress G with a SUB: to replace every occurrence of SUBin G with a single new nodeSUB best compresses G: num. of bits needed to represent G afterthe compression is minimum
Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 5 / 21
SUBDUE
SUBDUE [Joyner, 2000]graph-based hierarchical clustering algorithmsuited for discrete-valued and structured datasearches for substructures (i.e., subgraphs) that best compressthe input graph, according to MDL
Comments:to compress G with a SUB: to replace every occurrence of SUB inG with a single new nodeSUB best compresses G: num. of bits needed to represent Gafter the compression is minimum
Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 5 / 21
SUBDUE (cont’d)
1 searches for the best SUB (i.e., that minimizes the DescriptionLength of the graph)
2 compresses the graph by using the best SUB
Subsequent SUBs may be defined in terms of previously definedSUBs. Iterations of this basic step results in a lattice
Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 6 / 21
SUBDUE (cont’d)
1 searches for the best SUB (i.e., that minimizes the DescriptionLength of the graph)
2 compresses the graph by using the best SUB
Subsequent SUBs may be defined in terms of previously definedSUBs. Iterations of this basic step results in a lattice
Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 6 / 21
SUBDUE (cont’d)
1 searches for the best SUB (i.e., that minimizes the DescriptionLength of the graph)
2 compresses the graph by using the best SUB
Subsequent SUBs may be defined in terms of previously definedSUBs. Iterations of this basic step results in a lattice
Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 6 / 21
SUBDUE (cont’d)
1 searches for the best SUB (i.e., that minimizes the DescriptionLength of the graph)
2 compresses the graph by using the best SUB
Subsequent SUBs may be defined in terms of previously definedSUBs. Iterations of this basic step results in a lattice
Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 6 / 21
SUBDUE (cont’d)
1 searches for the best SUB (i.e., that minimizes the DescriptionLength of the graph)
2 compresses the graph by using the best SUB
Subsequent SUBs may be defined in terms of previously definedSUBs. Iterations of this basic step results in a lattice
Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 6 / 21
SUBDUE (cont’d)
1 searches for the best SUB (i.e., that minimizes the DescriptionLength of the graph)
2 compresses the graph by using the best SUB
Subsequent SUBs may be defined in terms of previously definedSUBs. Iterations of this basic step results in a lattice
Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 6 / 21
SUBDUE (cont’d)
1 searches for the best SUB (i.e., that minimizes the DescriptionLength of the graph)
2 compresses the graph by using the best SUB
Subsequent SUBs may be defined in terms of previously definedSUBs. Iterations of this basic step results in a lattice
Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 6 / 21
SUBDUE (cont’d)
Traditional cluster evaluation is not applicable to hierarchic domains
quality = interClusterDistanceintraClusterDistance
Desirable features of a good clustering:few and big clusters (large coverage, good generality)minimal/no overlap (better defined concepts)
New indexescompleteness: % of original nodes/edges still present in the finallatticerepresentativeness (of a SUB): % of input graphs holding theSUB at least oncesignificance: qualitative measure of the meaningfulness of acluster w.r.t. the domain and application
Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 7 / 21
SUBDUE (cont’d)
Traditional cluster evaluation is not applicable to hierarchic domains
quality = interClusterDistanceintraClusterDistance
Desirable features of a good clustering:few and big clusters (large coverage, good generality)minimal/no overlap (better defined concepts)
New indexescompleteness: % of original nodes/edges still present in the finallatticerepresentativeness (of a SUB): % of input graphs holding theSUB at least oncesignificance: qualitative measure of the meaningfulness of acluster w.r.t. the domain and application
Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 7 / 21
SUBDUE (cont’d)
Traditional cluster evaluation is not applicable to hierarchic domains
quality = interClusterDistanceintraClusterDistance
Desirable features of a good clustering:few and big clusters (large coverage, good generality)minimal/no overlap (better defined concepts)
New indexescompleteness: % of original nodes/edges still present in the finallatticerepresentativeness (of a SUB): % of input graphs holding theSUB at least oncesignificance: qualitative measure of the meaningfulness of acluster w.r.t. the domain and application
Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 7 / 21
SUBDUE (cont’d)
Traditional cluster evaluation is not applicable to hierarchic domains
quality = interClusterDistanceintraClusterDistance
Desirable features of a good clustering:few and big clusters (large coverage, good generality)minimal/no overlap (better defined concepts)
New indexescompleteness: % of original nodes/edges still present in the finallatticerepresentativeness (of a SUB): % of input graphs holding theSUB at least oncesignificance: qualitative measure of the meaningfulness of acluster w.r.t. the domain and application
Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 7 / 21
Representation
Approach1 representation of original processes into graphs2 usage of graph-mining techniques to extract SUBs
Common operators:sequencesplit-AND (parallel split), split-XOR (exclusive choice)join-AND (synchronization), join-XOR (simple merge)
How to map the process to the graph? Several choicesEmanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 8 / 21
Representation models: A rule
lowest level of compactnessevery operator is represented by a nodecalled operator, linked to another nodespecifying its type (AND/OR/SEQ) and toits operandsjoin/split are distinguishable by thenumber of in/out-going arcs
Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 9 / 21
Representation models: B rule
medium level of compactnessthe closest to the original representationjoin/split are replaced by different nodes,one for each kind of operator
Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 10 / 21
Representation models: C rule
highest level of compactnessimplicit representation of operatorsmultiple alternative graphs in case ofSPLIT-XOR or JOIN-XORarcs are labeled to preserve informationabout domain/range nodes
Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 11 / 21
Evaluation
Commentsthe three representations hold the same information, with no losseasy extension with more attributes (actor name/role, info abouttime/resources, ...):
attribute name → edgevalue → a node
Need of experimentations to assess which one performs best w.r.t.the quality of clustering.
Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 12 / 21
Evaluation
Commentsthe three representations hold the same information, with no losseasy extension with more attributes (actor name/role, info abouttime/resources, ...):
attribute name → edgevalue → a node
Need of experimentations to assess which one performs best w.r.t.the quality of clustering.
Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 12 / 21
Evaluation
Commentsthe three representations hold the same information, with no losseasy extension with more attributes (actor name/role, info abouttime/resources, ...):
attribute name → edgevalue → a node
Need of experimentations to assess which one performs best w.r.t.the quality of clustering.
Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 12 / 21
myExperiment
Dataset:258 processes (XScufl language for Taverna framework)e-Science WF for distributed computation over scientific dataapplications: local tools, Web Services, scripts
Setting:WF extraction/parsingpreprocessingtranslation into graphs (A,B,Crules)SUBDUE execution
Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 13 / 21
Experimental results
Output lattice (fragment)
Red boxes = top level SUBs
Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 14 / 21
Experimental results
A rule B rule C ruleSize (nodes) 5618 2871 2414SUBs (num) 671 611 580top SUBs 2.24% 52.37% 52.93%Completeness 95.41% 90.90% 90.27%Representativeness (first 10 top SUBs) 99.22% 31.78% 23.26%Significance of top level clusters – + ++Execution time (hh:mm) 03:38 00:06 00:03
Considerations:A model: low significance, high execution timeNo definitively best choice between B and C:
B model: higher representativeness → indexingC model: higher significance → usage pattern discovery
Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 15 / 21
Experimental results
A rule B rule C ruleSize (nodes) 5618 2871 2414SUBs (num) 671 611 580top SUBs 2.24% 52.37% 52.93%Completeness 95.41% 90.90% 90.27%Representativeness (first 10 top SUBs) 99.22% 31.78% 23.26%Significance of top level clusters – + ++Execution time (hh:mm) 03:38 00:06 00:03
Considerations:A model: low significance, high execution timeNo definitively best choice between B and C:
B model: higher representativeness → indexingC model: higher significance → usage pattern discovery
Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 15 / 21
Experimental results
A rule B rule C ruleSize (nodes) 5618 2871 2414SUBs (num) 671 611 580top SUBs 2.24% 52.37% 52.93%Completeness 95.41% 90.90% 90.27%Representativeness (first 10 top SUBs) 99.22% 31.78% 23.26%Significance of top level clusters – + ++Execution time (hh:mm) 03:38 00:06 00:03
Considerations:A model: low significance, high execution timeNo definitively best choice between B and C:
B model: higher representativeness → indexingC model: higher significance → usage pattern discovery
Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 15 / 21
Experimental results
A rule B rule C ruleSize (nodes) 5618 2871 2414SUBs (num) 671 611 580top SUBs 2.24% 52.37% 52.93%Completeness 95.41% 90.90% 90.27%Representativeness (first 10 top SUBs) 99.22% 31.78% 23.26%Significance of top level clusters – + ++Execution time (hh:mm) 03:38 00:06 00:03
Considerations:A model: low significance, high execution timeNo definitively best choice between B and C:
B model: higher representativeness → indexingC model: higher significance → usage pattern discovery
Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 15 / 21
Experimental results
A rule B rule C ruleSize (nodes) 5618 2871 2414SUBs (num) 671 611 580top SUBs 2.24% 52.37% 52.93%Completeness 95.41% 90.90% 90.27%Representativeness (first 10 top SUBs) 99.22% 31.78% 23.26%Significance of top level clusters – + ++Execution time (hh:mm) 03:38 00:06 00:03
Considerations:A model: low significance, high execution timeNo definitively best choice between B and C:
B model: higher representativeness → indexingC model: higher significance → usage pattern discovery
Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 15 / 21
Experimental results
A rule B rule C ruleSize (nodes) 5618 2871 2414SUBs (num) 671 611 580top SUBs 2.24% 52.37% 52.93%Completeness 95.41% 90.90% 90.27%Representativeness (first 10 top SUBs) 99.22% 31.78% 23.26%Significance of top level clusters – + ++Execution time (hh:mm) 03:38 00:06 00:03
Considerations:A model: low significance, high execution timeNo definitively best choice between B and C:
B model: higher representativeness → indexingC model: higher significance → usage pattern discovery
Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 15 / 21
Experimental results
A rule B rule C ruleSize (nodes) 5618 2871 2414SUBs (num) 671 611 580top SUBs 2.24% 52.37% 52.93%Completeness 95.41% 90.90% 90.27%Representativeness (first 10 top SUBs) 99.22% 31.78% 23.26%Significance of top level clusters – + ++Execution time (hh:mm) 03:38 00:06 00:03
Considerations:A model: low significance, high execution timeNo definitively best choice between B and C:
B model: higher representativeness → indexingC model: higher significance → usage pattern discovery
Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 15 / 21
Experimental results
A rule B rule C ruleSize (nodes) 5618 2871 2414SUBs (num) 671 611 580top SUBs 2.24% 52.37% 52.93%Completeness 95.41% 90.90% 90.27%Representativeness (first 10 top SUBs) 99.22% 31.78% 23.26%Significance of top level clusters – + ++Execution time (hh:mm) 03:38 00:06 00:03
Considerations:A model: low significance, high execution timeNo definitively best choice between B and C:
B model: higher representativeness → indexingC model: higher significance → usage pattern discovery
Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 15 / 21
Use Case
Scenario: a scientist wants to use web service X for DNA alignment
Baseline: browse myExperiment to find every workflow with X:too many resultsgiven a workflow: common practice or exception?
New approach: search for X in the lattice’s SUBs:1 find all usage patterns for X (i.e., top SUBs containing X)2 analyse their representativeness3 choose one of them or browse more in depth the lattice
Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 16 / 21
Use Case
Scenario: a scientist wants to use web service X for DNA alignment
Baseline: browse myExperiment to find every workflow with X:too many resultsgiven a workflow: common practice or exception?
New approach: search for X in the lattice’s SUBs:1 find all usage patterns for X (i.e., top SUBs containing X)2 analyse their representativeness3 choose one of them or browse more in depth the lattice
Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 16 / 21
Use Case
Scenario: a scientist wants to use web service X for DNA alignment
Baseline: browse myExperiment to find every workflow with X:too many resultsgiven a workflow: common practice or exception?
New approach: search for X in the lattice’s SUBs:1 find all usage patterns for X (i.e., top SUBs containing X)2 analyse their representativeness3 choose one of them or browse more in depth the lattice
Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 16 / 21
Use Case
Scenario: a scientist wants to use web service X for DNA alignment
Baseline: browse myExperiment to find every workflow with X:too many resultsgiven a workflow: common practice or exception?
New approach: search for X in the lattice’s SUBs:1 find all usage patterns for X (i.e., top SUBs containing X)2 analyse their representativeness3 choose one of them or browse more in depth the lattice
Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 16 / 21
Use Case
Scenario: a scientist wants to use web service X for DNA alignment
Baseline: browse myExperiment to find every workflow with X:too many resultsgiven a workflow: common practice or exception?
New approach: search for X in the lattice’s SUBs:1 find all usage patterns for X (i.e., top SUBs containing X)2 analyse their representativeness3 choose one of them or browse more in depth the lattice
Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 16 / 21
Use Case
Scenario: a scientist wants to use web service X for DNA alignment
Baseline: browse myExperiment to find every workflow with X:too many resultsgiven a workflow: common practice or exception?
New approach: search for X in the lattice’s SUBs:1 find all usage patterns for X (i.e., top SUBs containing X)2 analyse their representativeness3 choose one of them or browse more in depth the lattice
Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 16 / 21
Use case (cont’d)
Applications:reference during process design (learn-by-example)process retrieval: find every myExperiment WF containing such apattern
Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 17 / 21
Use case (cont’d)
Applications:reference during process design (learn-by-example)process retrieval: find every myExperiment WF containing such apattern
Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 17 / 21
Discussion
A graph-based clustering technique (SUBDUE) to recognize the mostfrequent/common subprocesses among a set of workflows/processschemas.
Commentsseveral representation alternatives for application of SUBDUEevaluations on specialized e-Science domainuseful to find typical patterns and schemas of usage for tools/tasks
Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 18 / 21
Conclusion
Applications:recognition of reference processes (common/best/worstpractices)organize a process repository (by indexing processes throughsubstructures)enterprise integration: find similarities (differences, overlapping,complementariness) in BP in different companies
Future work:extending experimentations, especially with (real) BusinessProcessesmanagement of heterogeneity (syntactic/semantic, level ofgranularity)
Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 19 / 21
Mining Usage Patterns from a Repositoryof Scientific Workflows
Claudia Diamantini, Domenico Potena, Emanuele Storti
DII, Universita Politecnica delle Marche, Ancona, Italy
SAC 2012, Riva del Garda, March 26-30
Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 20 / 21
SUBDUE
Subdue uses a variant of beam search (heuristic)Best SUB minimizes the Description Length of the graph
1 Search the best SUBInit : set of SUBs consisting of all uniquely labeled verticesextend each SUB in all possible ways by a single edge and a vertexorder SUBs according to MDL principlekeep only the top n SUBsEnd : exhaustion of search space or user constraints
Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 21 / 21
SUBDUE
Subdue uses a variant of beam search (heuristic)Best SUB minimizes the Description Length of the graph
1 Search the best SUBInit : set of SUBs consisting of all uniquely labeled verticesextend each SUB in all possible ways by a single edge and a vertexorder SUBs according to MDL principlekeep only the top n SUBsEnd : exhaustion of search space or user constraints
Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 21 / 21
SUBDUE
Subdue uses a variant of beam search (heuristic)Best SUB minimizes the Description Length of the graph
1 Search the best SUBInit : set of SUBs consisting of all uniquely labeled verticesextend each SUB in all possible ways by a single edge and a vertexorder SUBs according to MDL principlekeep only the top n SUBsEnd : exhaustion of search space or user constraints
Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 21 / 21
SUBDUE
Subdue uses a variant of beam search (heuristic)Best SUB minimizes the Description Length of the graph
1 Search the best SUBInit : set of SUBs consisting of all uniquely labeled verticesextend each SUB in all possible ways by a single edge and a vertexorder SUBs according to MDL principlekeep only the top n SUBsEnd : exhaustion of search space or user constraints
Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 21 / 21
SUBDUE
Subdue uses a variant of beam search (heuristic)Best SUB minimizes the Description Length of the graph
1 Search the best SUBInit : set of SUBs consisting of all uniquely labeled verticesextend each SUB in all possible ways by a single edge and a vertexorder SUBs according to MDL principlekeep only the top n SUBsEnd : exhaustion of search space or user constraints
Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 21 / 21
SUBDUE
Subdue uses a variant of beam search (heuristic)Best SUB minimizes the Description Length of the graph
1 Search the best SUBInit : set of SUBs consisting of all uniquely labeled verticesextend each SUB in all possible ways by a single edge and a vertexorder SUBs according to MDL principlekeep only the top n SUBsEnd : exhaustion of search space or user constraints
Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 21 / 21
SUBDUE
Subdue uses a variant of beam search (heuristic)Best SUB minimizes the Description Length of the graph
1 Search the best SUBInit : set of SUBs consisting of all uniquely labeled verticesextend each SUB in all possible ways by a single edge and a vertexorder SUBs according to MDL principlekeep only the top n SUBsEnd : exhaustion of search space or user constraints
Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 21 / 21
SUBDUE
Subdue uses a variant of beam search (heuristic)Best SUB minimizes the Description Length of the graph
1 Search the best SUBInit : set of SUBs consisting of all uniquely labeled verticesextend each SUB in all possible ways by a single edge and a vertexorder SUBs according to MDL principlekeep only the top n SUBsEnd : exhaustion of search space or user constraints
2 Compress the graph with the best SUB
Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 21 / 21
SUBDUE
Subdue uses a variant of beam search (heuristic)Best SUB minimizes the Description Length of the graph
1 Search the best SUBInit : set of SUBs consisting of all uniquely labeled verticesextend each SUB in all possible ways by a single edge and a vertexorder SUBs according to MDL principlekeep only the top n SUBsEnd : exhaustion of search space or user constraints
2 Compress the graph with the best SUB
Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 21 / 21
SUBDUE
Subdue uses a variant of beam search (heuristic)Best SUB minimizes the Description Length of the graph
1 Search the best SUBInit : set of SUBs consisting of all uniquely labeled verticesextend each SUB in all possible ways by a single edge and a vertexorder SUBs according to MDL principlekeep only the top n SUBsEnd : exhaustion of search space or user constraints
2 Compress the graph with the best SUB
Subsequent SUBs may be defined in terms of previously defined SUBs
Iterations of this basic step results in a lattice
Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 21 / 21
Recommended