Upload
dgarijo
View
153
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Works13 Presentation by Sonja Holl. The work presents how to model optimizations made to workflows as Research Objects.
Citation preview
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
On Specifying and Sharing Scientific WorkflowOptimization Results Using Research Objects
17. November 2013 | Sonja Holl*, Daniel Garijo+, Khalid Belhajjame$, Olav Zimmermann*, Renato De Giovanni#, Matthias Obst~, Carole Goble$
*Jülich Supercomputing Centre (JSC),Forschungszentrum Juelich, Germany+Ontology Engineering Group, Facultad de Informática Universidad Politécnica de Madrid, Spain$School of Computer Science University of Manchester, UK#Reference Center on Environmental Information Campinas SP, Brazil~Department of Biological and Environmental Sciences University of Gothenburg, Sweden
8th Workshop On Workflows in Support of Large-Scale Science
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
• Popular choice to design, manage, and execute in silico experiments
• Sharing and reuse via workflow repositories
Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 2
Scientific Workflows
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 3
Ecological Niche Modeling
23
4 5
Perform species adaptation to environmental changes (BioVeL Project)
1
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 4
Ecological Niche Modeling Workflow
createModel
testModel
calcAUC
Environmental Layer
Occurrence Data
Geographic Mask
AUC
Parameter
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 5
Designing workflow (from scratch)
Reusing workflow
Execution
Sharing & Analysis
in silico experiment
REFINE
Planning
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 6
Ecological Niche Modeling Workflow
createModel
testModel
calcAUC
Environmental Layer
Occurrence Data
Geographic MaskGamma
AUC
Cost NumberOfPseudoAbsences
SVMMaxentGARP
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 7
Ecological Niche Modeling Workflow
createModel
testModel
calcAUC
Environmental Layer
Occurrence Data
Geographic MaskGamma
AUC
Cost NumberOfPseudoAbsences
SVMMaxentGARP
1
12
100
‐3.2
a
gaussian
1.5
‐bt
0
6.7
/
2.3
84
‐2.91.3
1.94251
10
‐3
11
13
1
4.55
0.56.788
Select Algorithms
Select Parameters
BLAST
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 8
Common strategies to handle this challenge
• Default parameters & applications• Trial and error• Parameter sweeps
But: • Increasing complexity of scientific workflows• Raising number parameters• Work time & compute intensive
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 9
Designing workflow (from scratch)
Reusing workflow
Execution
Sharing & Analysis
in silico experimentREFINE
Optimization
Planning
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
Goal:• Automated way to find workflow settings that optimizes
the output
• Define workflow output(s) as fitness value• Use fitness value for evaluation (e.g. AUC or correlation
coefficient)• Use heuristic search algorithm to find best
Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 10
Intelligent automated optimization techniques
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 11
How does it work?
Taverna Optimization Layer
WMS Framework PluginsParameter OptimizationA
PI
Component Optimization
• Development of optimization framework that extends Taverna workflow management system
• Abstracts optimization process (e.g. parallel execution, security)
• Developer API allows rapid adaption of new optimization methods
• Optimization plugins can be added independently
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 12
Display the optimization
result
(1) Define sub-workflow (2) Specify input parameters (constraints)(3) Select fitness output parameters (e.g. AUC)(4) Define optimization method parameters (population size, termination criteria)
Taverna Optimization Framework & Plugin
.
.
.
Best Fitness: 0.34
Best Fitness: 0.42
Best Fitness: 0.48
Best Fitness: 0.49
1
2
x
Genetic Algorithm Parameter Optimization Plugin
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
• Workflow optimization starts from scratch each time• Optimization meta-data are lost
Idea: Capture optimization meta-data next to traditionalprovenance data
⇒ learn from/extend prior optimization runs⇒ improve and accelerate optimization process
Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 13
Status quo
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
• Aligned with W3C standards• Aggregates various resources • Describes scientific processes in machine readable
format • Specified by several ontologies
Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 14
Research Objects
ore:aggregates
…
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 15
Display the optimization
result
(1) Define sub-workflow (2) Specify input parameters (constraints)(3) Select fitness output parameters (e.g. AUC)(4) Define optimization parameters (population size, termination criteria)
Taverna Optimization Framework & Plugin
.
.
.
Best Fitness: 0.34Best Fitness: 0.42Best Fitness: 0.48
Best Fitness: 0.49
1
2
x
Genetic Algorithm Parameter Optimization Plugin
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 16
ro:ResearchObject
opt:OptimizationResearchObject
Describes the dependencies and parameter constraints
Describes the dependencies and parameter constraints
opt:SearchSpace
Describes the optimization algorithm and its parameters
Describes the optimization algorithm and its parameters
opt:Algorithm
Describes the fitness
functions
Describes the fitness
functions
opt:Fitness
The workflow that was optimized
The workflow that was optimized
opt:Workflow
Defines the population size and generation number for an Optimization
Run
Defines the population size and generation number for an Optimization
Run
opt:Generation
Describes the termination condition
defined by the user
Describes the termination condition
defined by the user
opt:TerminationCondition
Represents one result set: sub‐
workflow, parameters and obtained fitness
values
Represents one result set: sub‐
workflow, parameters and obtained fitness
values
opt:OptimizationRun
rdfs:subClassOf rdf:Property
ore:aggregates
Optimization Research Object Ontology
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 17
Algorithm
• Genetic Algorihm• Mutation rate: 0.1• Crossover rate 0.7
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 18
Search Space
Gamma:• Double• 0 - 10
• Cost/2 < Gamma (fictional)
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 19
Optimization Run
• Origin of result• Parameter setting• Fitness value
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
Genetic Algorithm Parameter Optimization Plugin
Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 20
Display the optimization
result
Taverna Optimization Framework & Plugin
.
.
.
Best Fitness: 0.34
Best Fitness: 0.42
Best Fitness: 0.48
Best Fitness: 0.49
1
2
x
Fitness: 0.05Fitness: 0.05
(1) Define sub-workflow (2) Specify input parameters (constraints)(3) Select fitness output parameters (e.g. AUC)(4) Define optimization parameters (population size, termination criteria)
Generation 1 Iteration 1
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
Genetic Algorithm Parameter Optimization Plugin
Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 21
Display the optimization
result
Taverna Optimization Framework & Plugin
.
.
.
Best Fitness: 0.34
Best Fitness: 0.42
Best Fitness: 0.48
Best Fitness: 0.49
1
2
x
Fitness: 0.05Fitness: 0.05
(1) Define sub-workflow (2) Specify input parameters (constraints)(3) Select fitness output parameters (e.g. AUC)(4) Define optimization parameters (population size, termination criteria)
Generation 1 Iteration 1
Fitness: 0.22
Generation 1 Iteration 2
Fitness: 0.27Generation 1 Iteration 3
Fitness: 0.19
Generation 1 Iteration 4
Fitness: 0.31Generation 1 Iteration 5
Fitness: 0.34Generation 1 Iteration 6
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
Genetic Algorithm Parameter Optimization Plugin
Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 22
Display the optimization
result
Taverna Optimization Framework & Plugin
.
.
.
Best Fitness: 0.34
Best Fitness: 0.42
Best Fitness: 0.48
Best Fitness: 0.49
1
2
x
Fitness: 0.05Fitness: 0.05
(1) Define sub-workflow (2) Specify input parameters (constraints)(3) Select fitness output parameters (e.g. AUC)(4) Define optimization parameters (population size, termination criteria)
Generation 1 Iteration 1
Fitness: 0.22
Generation 1 Iteration 2
Fitness: 0.27Generation 1 Iteration 3
Fitness: 0.19
Generation 1 Iteration 4
Fitness: 0.31Generation 1 Iteration 5
Fitness: 0.34Generation 1 Iteration 6
Fitness: 0.05
Fitness: 0.22
Fitness: 0.34
Fitness: 0.19
Fitness: 0.31
Fitness: 0.33
Generation 2 Iteration 1
Generation 2 Iteration 2
Generation 2 Iteration 3
Generation 2 Iteration 4
Generation 2 Iteration 5
Generation 2 Iteration 6
Fitness: 0.05
Fitness: 0.22
Fitness: 0.34
Fitness: 0.19
Fitness: 0.31
Fitness: 0.46
Generation 3 Iteration 1
Generation 3 Iteration 2
Generation 3 Iteration 3
Generation 3 Iteration 4
Generation 3 Iteration 5
Generation 3 Iteration 6
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 23
ExampleResult
Name Value
Gamma 2.36
Cost 8
NumberOfPseudoAbsences
363
Fitness 0.9207
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
• What is the optimal setting? - Reuse optimized settings• What ranges have been explored? - Adopt used parameter
ranges• What algorithm settings were used? - Reuse algorithm
settings• Are there similar optimizations? - Reuse existing results• Resume the optimization
• Embed optimization provenance into workflow infrastructures to be reused by other scientists
Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 24
Benefits of sharing and exploiting Optimization Research Objects
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
• Scientific workflows are hard to configure• Optimization can help but meta-data get lost• Extend Research Objects• Build new Optimization Research Object Ontology• Reuse of optimization meta-data to speed up
optimization• Shareable with the community in workflow infrastructures
• Outlook: How to learn from similar workflows?
Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 25
Conclusion
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
http://purl.org/net/ro-optimizationhttp://purl.org/net/svm-opt-research-object
Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 26
Links
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
Questions?Thank you!