46
A Semantic-Based Approach to Attain Reproducibility of Computational Environments in Scientific Workflows: A Case Study Idafen Santana-Perez 1 , Rafael Ferreira da Silva 2 , Mats Rynge 2 Ewa Deelman 2 , María S. Pérez-Hernández 1 , Oscar Corcho 1 1 Ontology Engineering Group, Universidad Politécnica de Madrid, Madrid, Spain 2 Univ. of Southern California, Information Sciences Institute, Marina Del Rey, CA, USA REPPAR'14 1st International Workshop on Reproducibility in Parallel Computing. Porto, August 2014

A Semantic-Based Approach to Attain Reproducibility of Computational Environments in Scientific Workflows: A Case Study

Embed Size (px)

DESCRIPTION

Slides from our presentation at the 1st International Workshop on Reproducibility in Parallel Computing (REPPAR'14) in conjunction with Euro-Par 2014 (August 25-29)

Citation preview

Page 1: A Semantic-Based Approach to Attain Reproducibility of Computational Environments in Scientific Workflows: A Case Study

A Semantic-Based Approach to Attain Reproducibility of

Computational Environments in Scientific Workflows: A Case Study

Idafen Santana-Perez1, Rafael Ferreira da Silva2, Mats Rynge2 Ewa Deelman2, María S. Pérez-Hernández1, Oscar Corcho1

1Ontology Engineering Group, Universidad Politécnica de Madrid, Madrid, Spain 2Univ. of Southern California, Information Sciences Institute, Marina Del Rey, CA, USA

REPPAR'14 1st International Workshop on Reproducibility in Parallel

Computing. Porto, August 2014

Page 2: A Semantic-Based Approach to Attain Reproducibility of Computational Environments in Scientific Workflows: A Case Study

Index

•  Introduction •  Reproducibility tools

• WICUS •  Pegasus & PRECIP

•  Reproducibility process •  Annotation •  Infrastructure Specification Algorithm

•  Use case •  Conclusion & Future Work

2 A Semantic-Based Approach to Attain Reproducibility...

Page 3: A Semantic-Based Approach to Attain Reproducibility of Computational Environments in Scientific Workflows: A Case Study

Introduction

•  Experiments in empirical science •  Primary component of the scientific method •  Main method for validating a hypothesis •  Repeatable procedure

•  Scientific publications •  Announce a result •  Convince readers that the result is correct

•  Computational science •  In silico science •  Computational scientific workflow: “a precise, executable

description of a scientific procedure” [De Roure, 2011]

•  “Reproducibility in principle underpins the scientific method” [Goble,2012]

•  “Its about capturing, preserving, reusing and curating” [Goble 2012]

3 A Semantic-Based Approach to Attain Reproducibility...

Page 4: A Semantic-Based Approach to Attain Reproducibility of Computational Environments in Scientific Workflows: A Case Study

Introduction

•  Reproducibility in Scientific Experiments

4 A Semantic-Based Approach to Attain Reproducibility...

INPUT DATA SCIENTIFIC PROCEDURE EQUIPMENT

IN V

IVO

/VIT

RO

IN

SIL

ICO

Page 5: A Semantic-Based Approach to Attain Reproducibility of Computational Environments in Scientific Workflows: A Case Study

Introduction

•  Reproducibility in Scientific Experiments

5 A Semantic-Based Approach to Attain Reproducibility...

INPUT DATA SCIENTIFIC PROCEDURE EQUIPMENT

IN V

IVO

/VIT

RO

IN

SIL

ICO

Page 6: A Semantic-Based Approach to Attain Reproducibility of Computational Environments in Scientific Workflows: A Case Study

CLOUD

Introduction

•  Reproducibility in Scientific Experiments

6 A Semantic-Based Approach to Attain Reproducibility...

FORMER EQUIPMENT

ANNOTATE REPRODUCE

SEMANTIC ANNOTATIONS

EQUIVALENT EXECUTION

ENVIRONMENT

•  “Its about capturing, preserving, reusing and curating” [Goble 2012]

Page 7: A Semantic-Based Approach to Attain Reproducibility of Computational Environments in Scientific Workflows: A Case Study

CLOUD

Semantics

•  Reproducibility in Scientific Experiments

7 A Semantic-Based Approach to Attain Reproducibility...

FORMER EQUIPMENT

ANNOTATE REPRODUCE

SEMANTIC ANNOTATIONS

EQUIVALENT EXECUTION

ENVIRONMENT

•  “Its about capturing, preserving, reusing and curating” [Goble 2012]

Page 8: A Semantic-Based Approach to Attain Reproducibility of Computational Environments in Scientific Workflows: A Case Study

Semantics

•  Vocabularies for documenting the main resources involved on the execution of a WF. •  Software •  Hardware •  Computational resources •  Workflow

•  Increasing the understanding of the underlying components

•  Making this knowledge explicit •  Standard technology: RDF & OWL •  Easy to extend and integrate

8 A Semantic-Based Approach to Attain Reproducibility...

Page 9: A Semantic-Based Approach to Attain Reproducibility of Computational Environments in Scientific Workflows: A Case Study

Semantics

•  WICUS ontology network •  Workflow Infrastructure Conservation Using Semantics •  http://purl.org/net/wicus •  5 ontologies •  WICUS Software Stack ontology •  WICUS Hardware Specs ontology •  WICUS Scientific Virtual Appliance ontology •  WICUS Workflow Execution Requirements ontology

•  WICUS Ontology: links the previous ontologies

9 A Semantic-Based Approach to Attain Reproducibility...

Page 10: A Semantic-Based Approach to Attain Reproducibility of Computational Environments in Scientific Workflows: A Case Study

WICUS ontology network

•  WICUS Software Stack ontology •  http://purl.org/net/wicus-stack

10 A Semantic-Based Approach to Attain Reproducibility...

Page 11: A Semantic-Based Approach to Attain Reproducibility of Computational Environments in Scientific Workflows: A Case Study

WICUS ontology network

•  WICUS Hardware Specs ontology •  http://purl.org/net/wicus-hwspecs

11 A Semantic-Based Approach to Attain Reproducibility...

Page 12: A Semantic-Based Approach to Attain Reproducibility of Computational Environments in Scientific Workflows: A Case Study

WICUS ontology network

12 A Semantic-Based Approach to Attain Reproducibility...

•  WICUS Scientific Virtual Appliance ontology •  http://purl.org/net/wicus-sva

Page 13: A Semantic-Based Approach to Attain Reproducibility of Computational Environments in Scientific Workflows: A Case Study

WICUS ontology network

•  WICUS Workflow Execution Requirements ontology •  http://purl.org/net/wicus-reqs

13 A Semantic-Based Approach to Attain Reproducibility...

Page 14: A Semantic-Based Approach to Attain Reproducibility of Computational Environments in Scientific Workflows: A Case Study

WICUS ontology network

•  WICUS ontology network •  http://purl.org/net/wicus

14 A Semantic-Based Approach to Attain Reproducibility...

Page 15: A Semantic-Based Approach to Attain Reproducibility of Computational Environments in Scientific Workflows: A Case Study

CLOUD

PRECIP

•  Reproducibility in Scientific Experiments

15 A Semantic-Based Approach to Attain Reproducibility...

FORMER EQUIPMENT

ANNOTATE REPRODUCE

SEMANTIC ANNOTATIONS

EQUIVALENT EXECUTION

ENVIRONMENT

•  “Its about capturing, preserving, reusing and curating” [Goble 2012]

Page 16: A Semantic-Based Approach to Attain Reproducibility of Computational Environments in Scientific Workflows: A Case Study

PRECIP

•  Pegasus Repeatable Experiments for the Clouds In Python

•  Experiment management control API •  Works with commercial and academic Clouds •  Tag-based system •  No need of pre-installed software on the VM image •  Create VM, transfer files, run commands remotely •  Linux

16 A Semantic-Based Approach to Attain Reproducibility...

Page 17: A Semantic-Based Approach to Attain Reproducibility of Computational Environments in Scientific Workflows: A Case Study

Pegasus

•  Pegasus WMS •  Million-task workflows •  Records

•  Data about execution •  Intermediate results

•  Replica Catalog •  DAX (Direct Acyclic Graph in XML) •  Transformation Catalog •  HTCondor for executing individual tasks

17 A Semantic-Based Approach to Attain Reproducibility...

Page 18: A Semantic-Based Approach to Attain Reproducibility of Computational Environments in Scientific Workflows: A Case Study

CLOUD

Reproducibility process

•  Reproducibility in Scientific Experiments

18 A Semantic-Based Approach to Attain Reproducibility...

FORMER EQUIPMENT

ANNOTATE REPRODUCE

SEMANTIC ANNOTATIONS

EQUIVALENT EXECUTION

ENVIRONMENT

•  “Its about capturing, preserving, reusing and curating” [Goble 2012]

Page 19: A Semantic-Based Approach to Attain Reproducibility of Computational Environments in Scientific Workflows: A Case Study

Reproducibility process

19 A Semantic-Based Approach to Attain Reproducibility...

Pegasus Transf. Catalog

DAX xml DAX annotator

TC annotator

WF Annot

SW Comp

Catalog

WF & Config Annot

Inf. Spec. Algorithm

SVA Catalog

Precip Script

1

4

2

5

6

7

89

WMS Annot

3

Page 20: A Semantic-Based Approach to Attain Reproducibility of Computational Environments in Scientific Workflows: A Case Study

Reproducibility process

•  Infrastructure Specification Algorithm •  Goal: obtain an specification defining what VMs need to be

created, what software components must be deployed and their configuration.

20 A Semantic-Based Approach to Attain Reproducibility...

Page 21: A Semantic-Based Approach to Attain Reproducibility of Computational Environments in Scientific Workflows: A Case Study

Reproducibility process

•  Infrastructure Specification Algorithm

21 A Semantic-Based Approach to Attain Reproducibility...

GET WF REQUIREMENTS

GET <REQ,STACKS>

GET <REQ,D-GRAPH>

GET AVAILABLE SVA

GET <SVA,STACKS>

CALCULATE REQ-SVA

COMPATIBILITY

GET MAX COMPATIBLE

REQ-SVA

CLEAN REQ D-GRAPH

Page 22: A Semantic-Based Approach to Attain Reproducibility of Computational Environments in Scientific Workflows: A Case Study

Infrastructure Specification Algorithm

22 A Semantic-Based Approach to Attain Reproducibility...

Page 23: A Semantic-Based Approach to Attain Reproducibility of Computational Environments in Scientific Workflows: A Case Study

Infrastructure Specification Algorithm

23 A Semantic-Based Approach to Attain Reproducibility...

S1

S2

S3

S4

S5

S6

Page 24: A Semantic-Based Approach to Attain Reproducibility of Computational Environments in Scientific Workflows: A Case Study

Infrastructure Specification Algorithm

24 A Semantic-Based Approach to Attain Reproducibility...

S1

S2

S3

S4

S5

S6

S7

S8

S9

S10

S11

S12

S15

S13

S14

S15

Page 25: A Semantic-Based Approach to Attain Reproducibility of Computational Environments in Scientific Workflows: A Case Study

Infrastructure Specification Algorithm

25 A Semantic-Based Approach to Attain Reproducibility...

S1

S2

S3

S4

S5

S6

S7

S8

S9

S10

S11

S12

S15

S13

S14

S15

Page 26: A Semantic-Based Approach to Attain Reproducibility of Computational Environments in Scientific Workflows: A Case Study

Infrastructure Specification Algorithm

26 A Semantic-Based Approach to Attain Reproducibility...

S1

S2

S3

S4

S5

S6

S7

S8

S9

S10

S11

S12

S15

S13

S14

S13

S15

S15

S14

S14

Page 27: A Semantic-Based Approach to Attain Reproducibility of Computational Environments in Scientific Workflows: A Case Study

Infrastructure Specification Algorithm

27 A Semantic-Based Approach to Attain Reproducibility...

S1

S2

S3

S4

S5

S6

S7

S8

S9

S10

S11

S12

S15

S13

S14

S13

S15

S15

S14

S14

Page 28: A Semantic-Based Approach to Attain Reproducibility of Computational Environments in Scientific Workflows: A Case Study

Infrastructure Specification Algorithm

28 A Semantic-Based Approach to Attain Reproducibility...

S1

S2

S3

S4

S5

S6

S7

S8

S9

S10

S11

S12

S15

S13

S14

S13

S15

S15

S14

S14

Page 29: A Semantic-Based Approach to Attain Reproducibility of Computational Environments in Scientific Workflows: A Case Study

Infrastructure Specification Algorithm

29 A Semantic-Based Approach to Attain Reproducibility...

S1

S2

S3

S4

S5

S6

S7

S8

S9

S10

S11

S12

S15

S13

S14

S13

S15

S15

S14

S14

Page 30: A Semantic-Based Approach to Attain Reproducibility of Computational Environments in Scientific Workflows: A Case Study

Infrastructure Specification Algorithm

30 A Semantic-Based Approach to Attain Reproducibility...

S1

S2

S3

S4

S5

S6

S7

S8

S9

S10

S11

S12

S15

S13

S14

S13

S15 S15

S14

S15

Page 31: A Semantic-Based Approach to Attain Reproducibility of Computational Environments in Scientific Workflows: A Case Study

Infrastructure Specification Algorithm

31 A Semantic-Based Approach to Attain Reproducibility...

S1

S2

S3

S4

S5

S6

S7

S8

S9

S10

S11

S12

S15

S13

S14

S13

S15 S15

S14

S15

Page 32: A Semantic-Based Approach to Attain Reproducibility of Computational Environments in Scientific Workflows: A Case Study

Infrastructure Specification Algorithm

32 A Semantic-Based Approach to Attain Reproducibility...

S1

S2

S3

S4

S5

S6

S7

S8

S9

S10

S11

S12

S15

S13

S14

S13

S15 S15

S14

S15

Page 33: A Semantic-Based Approach to Attain Reproducibility of Computational Environments in Scientific Workflows: A Case Study

Infrastructure Specification Algorithm

33 A Semantic-Based Approach to Attain Reproducibility...

S1

S2

S3

S4

S5

S6

S7

S8

S9

S10

S11

S12

S13

S15

S14

S15

Page 34: A Semantic-Based Approach to Attain Reproducibility of Computational Environments in Scientific Workflows: A Case Study

Infrastructure Specification Algorithm

34 A Semantic-Based Approach to Attain Reproducibility...

S1

S2

S3

S4

S5

S6

S7

S8

S9

S10

S11

S12

S13

S15

S14

S15

Page 35: A Semantic-Based Approach to Attain Reproducibility of Computational Environments in Scientific Workflows: A Case Study

Infrastructure Specification Algorithm

35 A Semantic-Based Approach to Attain Reproducibility...

S1

S2

S3

S4

S5

S6

S7

S8

S9

S10

S11

S12

S13

S15

S14

S15

Page 36: A Semantic-Based Approach to Attain Reproducibility of Computational Environments in Scientific Workflows: A Case Study

Infrastructure Specification Algorithm

36 A Semantic-Based Approach to Attain Reproducibility...

S1

S2

S3

S4

S5

S6

S7

S8

S9

S10

S11

S12

S13

S15

S14

S15

Page 37: A Semantic-Based Approach to Attain Reproducibility of Computational Environments in Scientific Workflows: A Case Study

Use case

•  Montage Workflow •  Astronomy workflow •  Construct large image mosaics of the sky •  Montage Software distribution •  59 binaries

•  Target IaaS Cloud Providers •  Amazon EC2 •  FutureGrid

37 A Semantic-Based Approach to Attain Reproducibility...

RO available at http://pegasus.isi.edu/publications/reppar

Page 38: A Semantic-Based Approach to Attain Reproducibility of Computational Environments in Scientific Workflows: A Case Study

Use case

•  Goal

38 A Semantic-Based Approach to Attain Reproducibility...

AWS

MONTAGE WORKFLOW

ENVIRONMENT

ANNOTATE REPRODUCE

SEMANTIC ANNOTATIONS

EQUIVALENT EXECUTION

ENVIRONMENT

FG

Page 39: A Semantic-Based Approach to Attain Reproducibility of Computational Environments in Scientific Workflows: A Case Study

Use case

•  Annotations

39 A Semantic-Based Approach to Attain Reproducibility...

Page 40: A Semantic-Based Approach to Attain Reproducibility of Computational Environments in Scientific Workflows: A Case Study

Use case

•  Annotations: Workflow

40 A Semantic-Based Approach to Attain Reproducibility...

Page 41: A Semantic-Based Approach to Attain Reproducibility of Computational Environments in Scientific Workflows: A Case Study

Use case

•  Annotations: Software

41 A Semantic-Based Approach to Attain Reproducibility...

Page 42: A Semantic-Based Approach to Attain Reproducibility of Computational Environments in Scientific Workflows: A Case Study

Use case

•  Annotations: C. Resources

42 A Semantic-Based Approach to Attain Reproducibility...

Page 43: A Semantic-Based Approach to Attain Reproducibility of Computational Environments in Scientific Workflows: A Case Study

Use case

•  Results •  2 PRECIP scripts (AWS and FG): creates VM, deploys and

configure software and executes the WF. •  Successfully executed •  Same results as the original WF

43 A Semantic-Based Approach to Attain Reproducibility...

Page 44: A Semantic-Based Approach to Attain Reproducibility of Computational Environments in Scientific Workflows: A Case Study

Conclusions

•  Semantic modelling approach to conserve computational resources

•  PRECIP for reproducing the execution environment on the Cloud

•  WICUS annotations + PRECIP scripting capabilities •  Apply those ideas to Montage on AWS EC2 and FG

•  Assume that the binaries are available

44 A Semantic-Based Approach to Attain Reproducibility...

Page 45: A Semantic-Based Approach to Attain Reproducibility of Computational Environments in Scientific Workflows: A Case Study

Future Work

•  Apply to other workflows •  Improve the annotation process •  Extend WICUS ontology network (new release coming

soon) •  Software variants •  Low level libraries •  Incompatibilities •  User policies

45 A Semantic-Based Approach to Attain Reproducibility...

Page 46: A Semantic-Based Approach to Attain Reproducibility of Computational Environments in Scientific Workflows: A Case Study

Questions

46 Conservation of Scientific Workflow Infrastructures by Using Semantics