27
Mitglied der Helmholtz-Gemeinschaft On Specifying and Sharing Scientific Workflow Optimization Results Using Research Objects 17. November 2013 | Sonja Holl*, Daniel Garijo + , Khalid Belhajjame $ , Olav Zimmermann*, Renato De Giovanni # , Matthias Obst ~ , Carole Goble $ *Jülich Supercomputing Centre (JSC),Forschungszentrum Juelich, Germany + Ontology Engineering Group, Facultad de Informática Universidad Politécnica de Madrid, Spain $ School of Computer Science University of Manchester, UK # Reference Center on Environmental Information Campinas SP, Brazil ~ Department of Biological and Environmental Sciences University of Gothenburg, Sweden 8th Workshop On Workflows in Support of Large-Scale Science

On Specifying and Sharing Scientific Workflow Optimization Results Using Research Objects

  • Upload
    dgarijo

  • View
    153

  • Download
    1

Embed Size (px)

DESCRIPTION

Works13 Presentation by Sonja Holl. The work presents how to model optimizations made to workflows as Research Objects.

Citation preview

Page 1: On Specifying and Sharing Scientific Workflow Optimization Results Using Research Objects

Mitg

lied

der H

elm

holtz

-Gem

eins

chaf

t

On Specifying and Sharing Scientific WorkflowOptimization Results Using Research Objects

17. November 2013 | Sonja Holl*, Daniel Garijo+, Khalid Belhajjame$, Olav Zimmermann*, Renato De Giovanni#, Matthias Obst~, Carole Goble$

*Jülich Supercomputing Centre (JSC),Forschungszentrum Juelich, Germany+Ontology Engineering Group,  Facultad de Informática Universidad Politécnica de Madrid, Spain$School of Computer Science University of Manchester, UK#Reference Center on Environmental Information Campinas SP, Brazil~Department of Biological and Environmental Sciences University of Gothenburg, Sweden

8th Workshop On Workflows in Support of Large-Scale Science

Page 2: On Specifying and Sharing Scientific Workflow Optimization Results Using Research Objects

Mitg

lied

der H

elm

holtz

-Gem

eins

chaf

t

• Popular choice to design, manage, and execute in silico experiments

• Sharing and reuse via workflow repositories

Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 2

Scientific Workflows

Page 3: On Specifying and Sharing Scientific Workflow Optimization Results Using Research Objects

Mitg

lied

der H

elm

holtz

-Gem

eins

chaf

t

Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 3

Ecological Niche Modeling

23

4 5

Perform species adaptation to environmental changes (BioVeL Project)

1

Page 4: On Specifying and Sharing Scientific Workflow Optimization Results Using Research Objects

Mitg

lied

der H

elm

holtz

-Gem

eins

chaf

t

Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 4

Ecological Niche Modeling Workflow

createModel

testModel

calcAUC

Environmental Layer

Occurrence Data

Geographic Mask

AUC

Parameter

Page 5: On Specifying and Sharing Scientific Workflow Optimization Results Using Research Objects

Mitg

lied

der H

elm

holtz

-Gem

eins

chaf

t

Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 5

Designing workflow (from scratch)

Reusing workflow

Execution

Sharing & Analysis

in silico experiment

REFINE

Planning

Page 6: On Specifying and Sharing Scientific Workflow Optimization Results Using Research Objects

Mitg

lied

der H

elm

holtz

-Gem

eins

chaf

t

Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 6

Ecological Niche Modeling Workflow

createModel

testModel

calcAUC

Environmental Layer

Occurrence Data

Geographic MaskGamma

AUC

Cost NumberOfPseudoAbsences

SVMMaxentGARP

Page 7: On Specifying and Sharing Scientific Workflow Optimization Results Using Research Objects

Mitg

lied

der H

elm

holtz

-Gem

eins

chaf

t

Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 7

Ecological Niche Modeling Workflow

createModel

testModel

calcAUC

Environmental Layer

Occurrence Data

Geographic MaskGamma

AUC

Cost NumberOfPseudoAbsences

SVMMaxentGARP

1

12

100

‐3.2

a

gaussian

1.5

‐bt

0

6.7

/

2.3

84

‐2.91.3

1.94251

10

‐3

11

13

1

4.55

0.56.788

Select Algorithms

Select Parameters

BLAST

Page 8: On Specifying and Sharing Scientific Workflow Optimization Results Using Research Objects

Mitg

lied

der H

elm

holtz

-Gem

eins

chaf

t

Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 8

Common strategies to handle this challenge

• Default parameters & applications• Trial and error• Parameter sweeps

But: • Increasing complexity of scientific workflows• Raising number parameters• Work time & compute intensive

Page 9: On Specifying and Sharing Scientific Workflow Optimization Results Using Research Objects

Mitg

lied

der H

elm

holtz

-Gem

eins

chaf

t

Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 9

Designing workflow (from scratch)

Reusing workflow

Execution

Sharing & Analysis

in silico experimentREFINE

Optimization

Planning

Page 10: On Specifying and Sharing Scientific Workflow Optimization Results Using Research Objects

Mitg

lied

der H

elm

holtz

-Gem

eins

chaf

t

Goal:• Automated way to find workflow settings that optimizes

the output

• Define workflow output(s) as fitness value• Use fitness value for evaluation (e.g. AUC or correlation

coefficient)• Use heuristic search algorithm to find best

Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 10

Intelligent automated optimization techniques

Page 11: On Specifying and Sharing Scientific Workflow Optimization Results Using Research Objects

Mitg

lied

der H

elm

holtz

-Gem

eins

chaf

t

Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 11

How does it work?

Taverna  Optimization     Layer     

WMS Framework PluginsParameter OptimizationA

PI

Component Optimization

• Development of optimization framework that extends Taverna workflow management system

• Abstracts optimization process (e.g. parallel execution, security)

• Developer API allows rapid adaption of new optimization methods

• Optimization plugins can be added independently

Page 12: On Specifying and Sharing Scientific Workflow Optimization Results Using Research Objects

Mitg

lied

der H

elm

holtz

-Gem

eins

chaf

t

Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 12

Display the optimization

result

(1) Define sub-workflow (2) Specify input parameters (constraints)(3) Select fitness output parameters (e.g. AUC)(4) Define optimization method parameters (population size, termination criteria)

Taverna Optimization Framework & Plugin

.

.

.

Best Fitness: 0.34

Best Fitness: 0.42

Best Fitness: 0.48

Best Fitness: 0.49

1

2

x

Genetic Algorithm Parameter Optimization Plugin 

Page 13: On Specifying and Sharing Scientific Workflow Optimization Results Using Research Objects

Mitg

lied

der H

elm

holtz

-Gem

eins

chaf

t

• Workflow optimization starts from scratch each time• Optimization meta-data are lost

Idea: Capture optimization meta-data next to traditionalprovenance data

⇒ learn from/extend prior optimization runs⇒ improve and accelerate optimization process

Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 13

Status quo

Page 14: On Specifying and Sharing Scientific Workflow Optimization Results Using Research Objects

Mitg

lied

der H

elm

holtz

-Gem

eins

chaf

t

• Aligned with W3C standards• Aggregates various resources • Describes scientific processes in machine readable

format • Specified by several ontologies

Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 14

Research Objects

ore:aggregates

Page 15: On Specifying and Sharing Scientific Workflow Optimization Results Using Research Objects

Mitg

lied

der H

elm

holtz

-Gem

eins

chaf

t

Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 15

Display the optimization

result

(1) Define sub-workflow (2) Specify input parameters (constraints)(3) Select fitness output parameters (e.g. AUC)(4) Define optimization parameters (population size, termination criteria)

Taverna Optimization Framework & Plugin

.

.

.

Best Fitness: 0.34Best Fitness: 0.42Best Fitness: 0.48

Best Fitness: 0.49

1

2

x

Genetic Algorithm Parameter Optimization Plugin 

Page 16: On Specifying and Sharing Scientific Workflow Optimization Results Using Research Objects

Mitg

lied

der H

elm

holtz

-Gem

eins

chaf

t

Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 16

ro:ResearchObject

opt:OptimizationResearchObject

Describes the dependencies and parameter constraints

Describes the dependencies and parameter constraints

opt:SearchSpace

Describes the optimization algorithm and its parameters

Describes the optimization algorithm and its parameters

opt:Algorithm

Describes the fitness 

functions

Describes the fitness 

functions

opt:Fitness

The workflow that was optimized

The workflow that was optimized

opt:Workflow

Defines the population size and generation number for an Optimization 

Run

Defines the population size and generation number for an Optimization 

Run

opt:Generation

Describes the termination condition 

defined by the user

Describes the termination condition 

defined by the user

opt:TerminationCondition

Represents one result set: sub‐

workflow, parameters and obtained fitness 

values

Represents one result set: sub‐

workflow, parameters and obtained fitness 

values

opt:OptimizationRun

rdfs:subClassOf rdf:Property

ore:aggregates

Optimization Research Object Ontology

Page 17: On Specifying and Sharing Scientific Workflow Optimization Results Using Research Objects

Mitg

lied

der H

elm

holtz

-Gem

eins

chaf

t

Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 17

Algorithm

• Genetic Algorihm• Mutation rate: 0.1• Crossover rate 0.7

Page 18: On Specifying and Sharing Scientific Workflow Optimization Results Using Research Objects

Mitg

lied

der H

elm

holtz

-Gem

eins

chaf

t

Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 18

Search Space

Gamma:• Double• 0 - 10

• Cost/2 < Gamma (fictional)

Page 19: On Specifying and Sharing Scientific Workflow Optimization Results Using Research Objects

Mitg

lied

der H

elm

holtz

-Gem

eins

chaf

t

Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 19

Optimization Run

• Origin of result• Parameter setting• Fitness value

Page 20: On Specifying and Sharing Scientific Workflow Optimization Results Using Research Objects

Mitg

lied

der H

elm

holtz

-Gem

eins

chaf

t

Genetic Algorithm Parameter Optimization Plugin 

Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 20

Display the optimization

result

Taverna Optimization Framework & Plugin

.

.

.

Best Fitness: 0.34

Best Fitness: 0.42

Best Fitness: 0.48

Best Fitness: 0.49

1

2

x

Fitness: 0.05Fitness: 0.05

(1) Define sub-workflow (2) Specify input parameters (constraints)(3) Select fitness output parameters (e.g. AUC)(4) Define optimization parameters (population size, termination criteria)

Generation 1 Iteration 1

Page 21: On Specifying and Sharing Scientific Workflow Optimization Results Using Research Objects

Mitg

lied

der H

elm

holtz

-Gem

eins

chaf

t

Genetic Algorithm Parameter Optimization Plugin 

Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 21

Display the optimization

result

Taverna Optimization Framework & Plugin

.

.

.

Best Fitness: 0.34

Best Fitness: 0.42

Best Fitness: 0.48

Best Fitness: 0.49

1

2

x

Fitness: 0.05Fitness: 0.05

(1) Define sub-workflow (2) Specify input parameters (constraints)(3) Select fitness output parameters (e.g. AUC)(4) Define optimization parameters (population size, termination criteria)

Generation 1 Iteration 1

Fitness: 0.22

Generation 1 Iteration 2

Fitness: 0.27Generation 1 Iteration 3

Fitness: 0.19

Generation 1 Iteration 4

Fitness: 0.31Generation 1 Iteration 5

Fitness: 0.34Generation 1 Iteration 6

Page 22: On Specifying and Sharing Scientific Workflow Optimization Results Using Research Objects

Mitg

lied

der H

elm

holtz

-Gem

eins

chaf

t

Genetic Algorithm Parameter Optimization Plugin 

Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 22

Display the optimization

result

Taverna Optimization Framework & Plugin

.

.

.

Best Fitness: 0.34

Best Fitness: 0.42

Best Fitness: 0.48

Best Fitness: 0.49

1

2

x

Fitness: 0.05Fitness: 0.05

(1) Define sub-workflow (2) Specify input parameters (constraints)(3) Select fitness output parameters (e.g. AUC)(4) Define optimization parameters (population size, termination criteria)

Generation 1 Iteration 1

Fitness: 0.22

Generation 1 Iteration 2

Fitness: 0.27Generation 1 Iteration 3

Fitness: 0.19

Generation 1 Iteration 4

Fitness: 0.31Generation 1 Iteration 5

Fitness: 0.34Generation 1 Iteration 6

Fitness: 0.05

Fitness: 0.22

Fitness: 0.34

Fitness: 0.19

Fitness: 0.31

Fitness: 0.33

Generation 2 Iteration 1

Generation 2 Iteration 2

Generation 2 Iteration 3

Generation 2 Iteration 4

Generation 2 Iteration 5

Generation 2 Iteration 6

Fitness: 0.05

Fitness: 0.22

Fitness: 0.34

Fitness: 0.19

Fitness: 0.31

Fitness: 0.46

Generation 3 Iteration 1

Generation 3 Iteration 2

Generation 3 Iteration 3

Generation 3 Iteration 4

Generation 3 Iteration 5

Generation 3 Iteration  6

Page 23: On Specifying and Sharing Scientific Workflow Optimization Results Using Research Objects

Mitg

lied

der H

elm

holtz

-Gem

eins

chaf

t

Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 23

ExampleResult

Name Value

Gamma 2.36

Cost 8

NumberOfPseudoAbsences

363

Fitness 0.9207

Page 24: On Specifying and Sharing Scientific Workflow Optimization Results Using Research Objects

Mitg

lied

der H

elm

holtz

-Gem

eins

chaf

t

• What is the optimal setting? - Reuse optimized settings• What ranges have been explored? - Adopt used parameter

ranges• What algorithm settings were used? - Reuse algorithm

settings• Are there similar optimizations? - Reuse existing results• Resume the optimization

• Embed optimization provenance into workflow infrastructures to be reused by other scientists

Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 24

Benefits of sharing and exploiting Optimization Research Objects

Page 25: On Specifying and Sharing Scientific Workflow Optimization Results Using Research Objects

Mitg

lied

der H

elm

holtz

-Gem

eins

chaf

t

• Scientific workflows are hard to configure• Optimization can help but meta-data get lost• Extend Research Objects• Build new Optimization Research Object Ontology• Reuse of optimization meta-data to speed up

optimization• Shareable with the community in workflow infrastructures

• Outlook: How to learn from similar workflows?

Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 25

Conclusion

Page 26: On Specifying and Sharing Scientific Workflow Optimization Results Using Research Objects

Mitg

lied

der H

elm

holtz

-Gem

eins

chaf

t

http://purl.org/net/ro-optimizationhttp://purl.org/net/svm-opt-research-object

Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 26

Links

Page 27: On Specifying and Sharing Scientific Workflow Optimization Results Using Research Objects

Mitg

lied

der H

elm

holtz

-Gem

eins

chaf

t

Questions?Thank you!