94
Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies Date: 22/01/16 Idafen Santana-Pérez Supervisors: María S. Pérez-Hernández, Oscar Corcho

PhD Thesis: Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Embed Size (px)

Citation preview

Page 1: PhD Thesis: Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Conservation of Computational Scientific Execution

Environments for Workflow-based Experiments Using

Ontologies

Date: 22/01/16

Idafen Santana-Pérez

Supervisors: María S. Pérez-Hernández, Oscar Corcho

Page 2: PhD Thesis: Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Introduction

2Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

HYPOTHESIS CONVINCEAUDIENCE

REPEATABLE

SCIENTIFIC EXPERIMENTS

Page 3: PhD Thesis: Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Introduction

3Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

SCIENTIFIC EXPERIMENTS

IN VIVO/VITRO IN SILICO

Page 4: PhD Thesis: Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Introduction

4Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

SCIENTIFIC EXPERIMENTS

IN VIVO/VITRO IN SILICO

REPEATABILITY

Page 5: PhD Thesis: Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Terminology

PRESERVATION

CONSERVATION

5Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Page 6: PhD Thesis: Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Terminology

PRESERVATION

CONSERVATION

REPLICABILITY

REPRODUCIBILITY

6Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Page 7: PhD Thesis: Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Experiment components

7Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

DATA SCIENTIFIC PROCEDURE EQUIPMENT

IN V

IVO

/VIT

RO

IN S

ILIC

O

Page 8: PhD Thesis: Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Experiment components

8Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

DATA SCIENTIFIC PROCEDURE EQUIPMENT

IN V

IVO

/VIT

RO

IN S

ILIC

O

Page 9: PhD Thesis: Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Experiment components

9Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

DATA SCIENTIFIC PROCEDURE EQUIPMENT

IN S

ILIC

O

Page 10: PhD Thesis: Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Experiment components

10Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

DATA SCIENTIFIC PROCEDURE EQUIPMENT

IN S

ILIC

O

Page 11: PhD Thesis: Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Experiment components

11Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

DATA SCIENTIFIC PROCEDURE EQUIPMENT

IN S

ILIC

O

Page 12: PhD Thesis: Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Experiment components

12Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

DATA SCIENTIFIC PROCEDURE EQUIPMENT

IN S

ILIC

O

Page 13: PhD Thesis: Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Research Methodology

13Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

State of the Art

Open Research Problems

Hypothesis

& GoalsEvaluation

Page 14: PhD Thesis: Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Open Research Problems

14Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Page 15: PhD Thesis: Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Open Research Problems

15Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

• Computational Infrastructures are usually a predefined element of a Computational Scientific Workflow.

Page 16: PhD Thesis: Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Open Research Problems

16Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

• Computational Infrastructures are usually a predefined element of a Computational Scientific Workflow.

• Execution Environments are poorly described.

Page 17: PhD Thesis: Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Open Research Problems

17Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

• Computational Infrastructures are usually a predefined element of a Computational Scientific Workflow.

• Execution Environments are poorly described.

• Current reproducibility approaches for computational experiments consider only data and procedure.

Page 18: PhD Thesis: Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Outline

18Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

1. Introduction and motivation

2. Hypothesis and goals

3. Execution environment representation

4. Experiment reproduction

5. Evaluation

6. Conclusions and future work

Page 19: PhD Thesis: Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Hypothesis

19Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

It is possible to describe the main properties of the Execution Environment of a Computational Scientific Experiment and, based on this description, derive a reproduction process for generating an equivalent environment using virtualization

techniques.

Page 20: PhD Thesis: Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Hypothesis

20Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

It is possible to describe the main properties of the Execution Environment of a Computational Scientific Experiment and, based on this description, derive a reproduction process for generating an equivalent environment using virtualization

techniques.

• Hypothesis 1: Semantic technologies are expressive enough to describe the Execution Environment of a Computational Scientific Experiment.

Page 21: PhD Thesis: Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Hypothesis

21Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

It is possible to describe the main properties of the Execution Environment of a Computational Scientific Experiment and,

based on this description, derive a reproduction process for generating an equivalent environment using virtualization

techniques.

• Hypothesis 2: An algorithmic process can be developed that, based on the description of the main capabilities of an Execution Environment, is able to define an equivalent infrastructure for executing the original Computational Scientific Experiment obtaining equivalent results.

Page 22: PhD Thesis: Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Hypothesis

22Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

It is possible to describe the main properties of the Execution Environment of a Computational Scientific Experiment and, based on this description, derive a reproduction process for generating an equivalent environment using virtualization

techniques.

• Hypothesis 3: Virtualization techniques are capable of supporting the reproduction of an Execution Environment by creating and customizing computational resources, such as Virtual Machines, that fulfil the requirements of the former experiment.

Page 23: PhD Thesis: Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Goals

23Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

• Goal 1: Create a model able to conceptualize the set of relevant capabilities that describe a Computational Infrastructure.

• Goal 2: Design a framework to provide means for populating these models, collecting information from the materials of a Computational Scientific Experiment and generating structured information.

H1

H1

Page 24: PhD Thesis: Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Goals

24Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

• Goal 3: Propose an algorithm that, based on the description of a former Computational Infrastructure, is able to define an equivalent infrastructure specification.

• Goal 4: Integrate a system able to deploy virtual machines on several Virtualized Infrastructure providers, meeting a certain hardware specification and install and configure the proper software stack, based on the deployment plan specified by the aforementioned algorithms.

H2

H3

Page 25: PhD Thesis: Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Restrictions and assumptions

25Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

• Restrictions • Performance• Common software components• Web services• Data-related aspects

• Assumptions• Reproducibility is more important than performance• Sc. Workflows are a widely accepted approach• Virtualization solutions are a mature technology• Equivalent environment and results

Page 26: PhD Thesis: Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Outline

26Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

1. Introduction and motivation

2. Hypothesis and goals

3. Execution environment representation

4. Experiment reproduction

5. Evaluation

6. Conclusions and future work

Page 27: PhD Thesis: Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Representation

27Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

CLOUD

• Describing execution environments

FORMEREQUIPMENT

ANNOTATE REPRODUCE

SEMANTIC ANNOTATIONS

EQUIVALENT EXECUTION

ENVIRONMENT

Page 28: PhD Thesis: Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Representation

• Semantic models for describing the main aspects related to the execution of a workflow.• Workflow• Software• Hardware• Computational resources

• Increasing the understanding of the underlying components

• Making this knowledge explicit• Easy to extend and integrate

• NeOn methodology• Scenario-based methodology for building ontologies

• Standard technology: RDF & OWL

28Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Page 29: PhD Thesis: Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Representation

• WICUS ontology network • Workflow Infrastructure Conservation Using Semantics• http://purl.org/net/wicus• 5 ontologies• WICUS Workflow Execution Requirements ontology• WICUS Software Stack ontology• WICUS Hardware Specs ontology• WICUS Scientific Virtual Appliance ontology• WICUS Ontology: links the previous ontologies

29Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Page 30: PhD Thesis: Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

WICUS ontology network

• WICUS Workflow Execution Requirements ontology• http://purl.org/net/wicus-reqs

30Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Page 31: PhD Thesis: Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

WICUS ontology network

• WICUS Software Stack ontology• http://purl.org/net/wicus-stack

31Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Page 32: PhD Thesis: Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

WICUS ontology network

• WICUS Scientific Virtual Appliance ontology• http://purl.org/net/wicus-sva

32Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Page 33: PhD Thesis: Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

WICUS ontology network

• WICUS Hardware Specs ontology• http://purl.org/net/wicus-hwspecs

33Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Page 34: PhD Thesis: Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

WICUS ontology network

• WICUS ontology network• http://purl.org/net/wicus

34Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Page 35: PhD Thesis: Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

WICUS ontology network

• WICUS ontology network• http://purl.org/net/wicus

35Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Page 36: PhD Thesis: Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Outline

36Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

1. Introduction and motivation

2. Hypothesis and goals

3. Execution environment representation

4. Experiment reproductionA. Parsing tools and semantic annotations

B. Specification process

C. Enactment and execution

5. Evaluation

6. Conclusions and future work

Page 37: PhD Thesis: Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

WICUS system

• Overview, inputs and outputs

37Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Page 38: PhD Thesis: Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Outline

38Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

1. Introduction and motivation

2. Hypothesis and goals

3. Execution environment representation

4. Experiment reproductionA. Parsing tools and semantic annotations

B. Specification process

C. Enactment and execution

5. Evaluation

6. Conclusions and future work

Page 39: PhD Thesis: Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Parsing tools and semantic annotations

39Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Page 40: PhD Thesis: Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Parsing tools and semantic annotations

40Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Page 41: PhD Thesis: Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Parsing tools and semantic annotations

• Workflow Specification File

41Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Page 42: PhD Thesis: Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Parsing tools and semantic annotations

• Workflow Parser and Annotator• Workflow Annotations

42Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Page 43: PhD Thesis: Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Parsing tools and semantic annotations

• WMS Annotations

43Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Page 44: PhD Thesis: Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Parsing tools and semantic annotations

• Software Components Registry

44Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Page 45: PhD Thesis: Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Parsing tools and semantic annotations

• Software Components Annotator• Software Components Catalog

45Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Page 46: PhD Thesis: Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Parsing tools and semantic annotations

• Software Components Annotator• Software Components Catalog• Workflow & Configuration Annotations

46Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Page 47: PhD Thesis: Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Parsing tools and semantic annotations

• Scientific Virtual Appliance Catalog

47Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Page 48: PhD Thesis: Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Outline

48Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

1. Introduction and motivation

2. Hypothesis and goals

3. Execution environment representation

4. Experiment reproductionA. Parsing tools and semantic annotations

B. Specification process

C. Enactment and execution

5. Evaluation

6. Conclusions and future work

Page 49: PhD Thesis: Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Specification process

49Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Page 50: PhD Thesis: Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Specification process

• Infrastructure Specification Algorithm (ISA)

50Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

GET WFREQUIREMENTS

GET<REQ,STACKS>

GET<REQ,D-GRAPH>

GET AVAILABLESVA

GET<SVA,STACKS>

CALCULATEREQ-SVA

COMPATIBILITY

GET MAX COMPATIBLE

REQ-SVA

CLEAN REQD-GRAPH

Page 51: PhD Thesis: Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Infrastructure Specification Algorithm

WORKFLOW

REQ1

REQ2

REQ3

51Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Page 52: PhD Thesis: Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Infrastructure Specification Algorithm

WORKFLOW

REQ1

REQ2

REQ3

S1

S2

S3

S4

S5

S6

52Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Page 53: PhD Thesis: Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Infrastructure Specification Algorithm

WORKFLOW

REQ1

REQ2

REQ3

S1

S2

S3

S4

S5

S6

S7

S8

S9

S10

S11

S12

S15

S13

S14

S15

53Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Page 54: PhD Thesis: Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Infrastructure Specification Algorithm

WORKFLOW

REQ1

REQ2

REQ3

S1

S2

S3

S4

S5

S6

S7

S8

S9

S10

S11

S12

S15

S13

S14

SVA1

SVA2

SVA3

S15

54Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Page 55: PhD Thesis: Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Infrastructure Specification Algorithm

WORKFLOW

REQ1

REQ2

REQ3

S1

S2

S3

S4

S5

S6

S7

S8

S9

S10

S11

S12

S15

S13

S14

SVA1

SVA2

SVA3

S13

S15

S15

S14

S14

55Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Page 56: PhD Thesis: Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Infrastructure Specification Algorithm

WORKFLOW

REQ1

REQ2

REQ3

S1

S2

S3

S4

S5

S6

S7

S8

S9

S10

S11

S12

S15

S13

S14

SVA1

SVA2

SVA3

S13

S15

S15

S14

S14

56Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Page 57: PhD Thesis: Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Infrastructure Specification Algorithm

WORKFLOW

REQ1

REQ2

REQ3

S1

S2

S3

S4

S5

S6

S7

S8

S9

S10

S11

S12

S15

S13

S14

SVA1

SVA2

SVA3

S13

S15

S15

S14

S14

57Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Page 58: PhD Thesis: Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Infrastructure Specification Algorithm

WORKFLOW

REQ1

REQ2

REQ3

S1

S2

S3

S4

S5

S6

S7

S8

S9

S10

S11

S12

S15

S13

S14

SVA1

SVA2

SVA3

S13

S15

S15

S14

S14

58Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Page 59: PhD Thesis: Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Infrastructure Specification Algorithm

REQ1

REQ2

REQ3

S1

S2

S3

S4

S5

S6

S7

S8

S9

S10

S11

S12

S15

S13

S14

SVA1

SVA3

S13

S15S15

S14

SVA3S15

59Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Page 60: PhD Thesis: Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Infrastructure Specification Algorithm

REQ1

REQ2

REQ3

S1

S2

S3

S4

S5

S6

S7

S8

S9

S10

S11

S12

S15

S13

S14

SVA1

SVA3

S13

S15S15

S14

SVA3S15

60Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Page 61: PhD Thesis: Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Infrastructure Specification Algorithm

REQ1

REQ2

REQ3

S1

S2

S3

S4

S5

S6

S7

S8

S9

S10

S11

S12

S15

S13

S14

SVA1

SVA3

S13

S15S15

S14

SVA3S15

61Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Page 62: PhD Thesis: Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Infrastructure Specification Algorithm

REQ1

REQ2

REQ3

S1

S2

S3

S4

S5

S6

S7

S8

S9

S10

S11

S12

SVA1

SVA3

S13

S15

S14

SVA3S15

62Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Page 63: PhD Thesis: Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Specification process

• Abstract Deployment Plan• Provider-independent representation format• Based on the WICUS stack ontology

63Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Page 64: PhD Thesis: Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Outline

64Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

1. Introduction and motivation

2. Hypothesis and goals

3. Execution environment representation

4. Experiment reproductionA. Parsing tools and semantic annotations

B. Specification process

C. Enactment and execution

5. Evaluation

6. Conclusions and future work

Page 65: PhD Thesis: Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Enactment and Execution

65Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Page 66: PhD Thesis: Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Enactment and Execution

• PRECIP• Pegasus Repeatable Experiments for the Cloud in

Python (PRECIP)• API for running experiments in Clouds• OpenStack and AWS EC2 API• Running remote commands and file transfer• No pre-installed components in the VM images

66Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Page 67: PhD Thesis: Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Enactment and Execution

67Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

• Vagrant• Local virtualization• Virtualization tools

• VirtualBox• VMWare

• Vagrantfiles• Shared folder

Page 68: PhD Thesis: Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Summary

68Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Page 69: PhD Thesis: Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Outline

69Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

1. Introduction and motivation

2. Hypothesis and goals

3. Execution environment representation

4. Experiment reproduction

5. Evaluation

6. Conclusions and future work

Page 70: PhD Thesis: Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Evaluation

• Workflows reproduced

70Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Page 71: PhD Thesis: Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Evaluation

• Workflows reproduced• 3 scientific domains

71Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Domain Seismic Astronomy Bio

WMS dispel4py Pegasus Makeflow

Name xcorr InternalExtinction Montage Epigenomics SoyKB BLAST

Page 72: PhD Thesis: Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Evaluation

• Workflows reproduced• 3 scientific domains• 3 workflow management systems

72Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Domain Seismic Astronomy Bio

WMS dispel4py Pegasus Makeflow

Name xcorr InternalExtinction Montage Epigenomics SoyKB BLAST

Page 73: PhD Thesis: Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Evaluation

• Workflows reproduced• 3 scientific domains• 3 workflow management systems• 6 different workflows

73Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Domain Seismic Astronomy Bio

WMS dispel4py Pegasus Makeflow

Name xcorr InternalExtinction Montage Epigenomics SoyKB BLAST

(2003) (2014)(2014) (2015) (2011)(2011)

Page 74: PhD Thesis: Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Evaluation

• Experimental setup

74Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

AWS EC2 FutureGrid Vagrant

• Public Cloud provider• De facto standard

• Academic Cloud facility• OpenStack Havana• India server

• 1024 cores • 3072 GB RM

• Local virtualization solution

• VirtualBox• Ubuntu 12.04.5

• 4 cores, at 2 GHz • 8 Gb RAM

PRECIP VAGRANT

Page 75: PhD Thesis: Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Evaluation

• Experimental setup

75Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Page 76: PhD Thesis: Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Evaluation

76Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Domain Seismic Astronomy Bio

WMS dispel4py Pegasus Makeflow

Name xcorr InternalExtinction Montage Epigenomics SoyKB BLAST

Results

FORMEREQUIPMENT

ANNOTATE REPRODUCE

CLOUD

EQUIVALENT EXECUTION ENVIRONMENTSEMANTIC

ANNOTATIONS

COMPARE

Page 77: PhD Thesis: Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Evaluation

77Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Domain Seismic Astronomy Bio

WMS dispel4py Pegasus Makeflow

Name xcorr InternalExtinction Montage Epigenomics SoyKB BLAST

Results

CLOUD

FORMEREQUIPMENT

ANNOTATE REPRODUCE

SEMANTIC ANNOTATIONS

EQUIVALENT EXECUTION ENVIRONMENT

COMPARE

Page 78: PhD Thesis: Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Evaluation

78Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Domain Seismic Astronomy Bio

WMS dispel4py Pegasus Makeflow

Name xcorr InternalExtinction Montage Epigenomics SoyKB BLAST

Results

CLOUD

FORMEREQUIPMENT

ANNOTATE REPRODUCE

SEMANTIC ANNOTATIONS

EQUIVALENT EXECUTION ENVIRONMENT

COMPARE

• Non-deterministic• Standard and error output• Generated files equivalent

Page 79: PhD Thesis: Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Evaluation

79Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Domain Seismic Astronomy Bio

WMS dispel4py Pegasus Makeflow

Name xcorr InternalExtinction Montage Epigenomics SoyKB BLAST

Results

CLOUD

FORMEREQUIPMENT

ANNOTATE REPRODUCE

SEMANTIC ANNOTATIONS

EQUIVALENT EXECUTION ENVIRONMENT

COMPARE

• Same results• Results from Int. Extinction

may vary

Page 80: PhD Thesis: Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Evaluation

80Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Domain Seismic Astronomy Bio

WMS dispel4py Pegasus Makeflow

Name xcorr InternalExtinction Montage Epigenomics SoyKB BLAST

Results

CLOUD

FORMEREQUIPMENT

ANNOTATE REPRODUCE

SEMANTIC ANNOTATIONS

EQUIVALENT EXECUTION ENVIRONMENT

COMPARE

• Genomic data• Exact match

Page 81: PhD Thesis: Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Evaluation

81Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Domain Seismic Astronomy Bio

WMS dispel4py Pegasus Makeflow

Name xcorr InternalExtinction Montage Epigenomics SoyKB BLAST

Results

CLOUD

FORMEREQUIPMENT

ANNOTATE REPRODUCE

SEMANTIC ANNOTATIONS

EQUIVALENT EXECUTION ENVIRONMENT

COMPARE

Page 82: PhD Thesis: Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Outline

82Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

1. Introduction and motivation

2. Hypothesis and goals

3. Execution environment representation

4. Experiment reproduction

5. Evaluation

6. Conclusions and future work

Page 83: PhD Thesis: Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Conclusions

83Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Hypothesis 1: Semantic technologies are expressive enough to describe the Execution Environment of a

Computational Scientific Experiment.

• Goal 1• WICUS ontology network

• Goal 2• Parsing and annotations modules

Page 84: PhD Thesis: Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Conclusions

84Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Hypothesis 2: An algorithmic process can be developed that, based on the description of the main capabilities of an Execution Environment, is able to define an equivalent infrastructure for executing the original Computational

Scientific Experiment obtaining equivalent results

• Goal 3• Infrastructure Specification Algorithm• Abstract Deployment Plan

Page 85: PhD Thesis: Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Conclusions

85Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Hypothesis 3: Virtualization techniques are capable of supporting the reproduction of an Execution Environment by creating and customizing computational resources, such as Virtual Machines, that fulfil the requirements of the former

experiment.

• Goal 4• Script Generator for PRECIP and Vagrant scripts• AWS EC2, FutureGrid, and Vagrant

Page 86: PhD Thesis: Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Conclusions

• Other approaches• Sharing VM• Exhaustive trace of the execution components• Semantic description for business processes

86Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Page 87: PhD Thesis: Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Dissemination

87Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

• Journals

• Idafen Santana-Perez, Rafael Ferreira da Silva, Mats Rynge, Ewa Deelman, María S. Pérez-Hernández, Oscar Corcho, “Reproducibility of execution environments in computational science using Semantics and Clouds”, Future Generation Computer Systems, Available online 8 January 2016, ISSN 0167-739X, http://dx.doi.org/10.1016/j.future.2015.12.017 (impact factor: 2.786)

• Santana-Perez, Idafen and Pérez-Hernández, María , “Towards Reproducibility in Scientific Workflows: An Infrastructure-Based Approach” Scientific Programming, vol. 2015, Article ID 243180, 11 pages, 2015. doi:10.1155/2015/243180 (impact factor: 0.559)

Page 88: PhD Thesis: Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Dissemination

88Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

• Conferences & workshops

• Doug James, et. al. (including Santana-Perez, Idafen),“Standing Together for Reproducibility in Large-Scale Computing: Report on reproducibility@XSEDE” reproducibility@XSEDE workshop, 2014.

• Santana-Perez, Idafen, Ferreira da Silva, Rafael, Rynge, Mats, Deelman, Ewa, Pérez-Henández, María, Corcho, Oscar , “A Semantic-Based Approach to Attain Reproducibility of Computational Environments in Scientific Workflows: A Case Study” 1st International Workshop on Reproducibility in Parallel Computing (REPPAR14) in conjunction with Euro-Par 2014 (August 25-29), Porto, Portugal.

• Santana-Perez, Idafen and Pérez-Hernández, María.; , “A Semantic Scheduler Architecture for Federated Hybrid Clouds” Cloud Computing (CLOUD), 2012 IEEE 5th International Conference on , vol., no., pp.384-391, 24-29 June 2012.

Page 89: PhD Thesis: Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Future work

89Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Page 90: PhD Thesis: Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Future work

90Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

• Incentives for scientists to produce reproducible results• Define roles and responsibilities• Infrastructure management plan

Page 91: PhD Thesis: Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Future work

91Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

• Incentives for scientists to produce reproducible results• Define roles and responsibilities• Infrastructure management plan

• Publish descriptions as Linked Data• Linking it with other resources describing scientific

workflows

Page 92: PhD Thesis: Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Future work

92Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

• Incentives for scientists to produce reproducible results• Define roles and responsibilities• Infrastructure management plan

• Publish descriptions as Linked Data• Linking it with other resources describing scientific

workflows

• Multi-node infrastructures

Page 93: PhD Thesis: Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Future work

93Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

• Incentives for scientists to produce reproducible results• Define roles and responsibilities• Infrastructure management plan

• Publish descriptions as Linked Data• Linking it with other resources describing scientific

workflows

• Multi-node infrastructures

• Completeness of annotations

Page 94: PhD Thesis: Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

Conservation of Computational Scientific Execution

Environments for Workflow-based Experiments Using

OntologiesIdafen Santana-Pérez

Supervisors: María S. Pérez-Hernández, Oscar Corcho

Date: 22/01/16

Experimental materials available online:http://w3id.org/idafensp/ro/wicuspegasusmontagehttp://w3id.org/idafensp/ro/wicuspegasusepigenomicshttp://w3id.org/idafensp/ro/wicuspegasussoykbhttp://w3id.org/idafensp/ro/wicusdispel4pyastrohttp://w3id.org/idafensp/ro/wicusdispel4pyxcorrhttp://w3id.org/idafensp/ro/wicusmakeflowblast