24
Conservation of Scientific Workflow Infrastructures by Using Semantics December 2012 Idafen Santana-Pérez [email protected] Ontology Engineering Group Universidad Politécnica de Madrid Madrid, Spain

Conservation of Scientific Workflow Infrastructures by Using Semantics - 2012

Embed Size (px)

Citation preview

Page 1: Conservation of Scientific Workflow Infrastructures by Using Semantics - 2012

Conservation of Scientific Workflow Infrastructures by

Using Semantics

December 2012

Idafen Santana-Pérez

[email protected]

Ontology Engineering Group

Universidad Politécnica de Madrid

Madrid, Spain

Page 2: Conservation of Scientific Workflow Infrastructures by Using Semantics - 2012

Index

• Introduction• Overview• Terminology

• SoA• Open Issues • Goals• Approach

2Conservation of Scientific Workflow Infrastructures by Using Semantics

Page 3: Conservation of Scientific Workflow Infrastructures by Using Semantics - 2012

Overview

• Experiments in empirical science• Primary component of the scientific method• Main method for validating a hypothesis• Repeatable procedure

• Scientific publications• Announce a result• Convince readers that the result is correct

• Computational science• In silico science• Computational scientific workflow: “a precise, executable

description of a scientific procedure” [De Roure, 2011]

• “Reproducibility in principle underpins the scientific method” [Goble,2012]

• “Its about capturing, preserving, reusing and curating” [Goble 2012]

3Conservation of Scientific Workflow Infrastructures by Using Semantics

Page 4: Conservation of Scientific Workflow Infrastructures by Using Semantics - 2012

Overview

• Experiment components:• Data• Method• Apparatus/Equipment

4Conservation of Scientific Workflow Infrastructures by Using Semantics

Classic Experiments Components Computational Experiments

Measurements, real life items, individuals, etc.

DataFiles, DBs, Web Services providing data, etc.

Step-by-step process description, in vivo/vitro WFs.

Method Computational WF

Bunsen burners, Petri dishes, microscopes, etc.

EquipmentInfrastructures: software tools, hardware resources, WSs, etc.

Page 5: Conservation of Scientific Workflow Infrastructures by Using Semantics - 2012

Overview

• Experiment components:• Data• Method• Apparatus/Equipment

5Conservation of Scientific Workflow Infrastructures by Using Semantics

Classic Experiments Components Computational Experiments

Measurements, real life items, individuals, etc.

DataFiles, DBs, Web Services providing data, etc.

Step-by-step process description, in vivo/vitro WFs.

Method Computational WF

Bunsen burners, Petri dishes, microscopes, etc.

EquipmentInfrastructures: software tools, hardware resources, WSs, etc.

Page 6: Conservation of Scientific Workflow Infrastructures by Using Semantics - 2012

Terminology

• Reproducibility• the ability to replicate or reproduce experimental results.

• of a method/test can be defined as the closeness of the agreement between independent results obtained with the same method on the identical subject(s) (or object, or test material) but under different conditions [Slezák,2011]

• Replicability: exact reproduction• “A lab will generally repeat an experiment several times and look

for results before they get published. But, once that paper is published, people tend to look for reproducibility in other ways, testing the consequences of a finding, extending it to new contexts or different populations. Almost nobody goes back and repeats something that's already been published, though.” [Timmer, 2012]

• Replicability is the poor cousin of reproducibility [Drummond, 2009]

6Conservation of Scientific Workflow Infrastructures by Using Semantics

Page 7: Conservation of Scientific Workflow Infrastructures by Using Semantics - 2012

Terminology

• Conservation• Action of prolonging the existence of significant objects.

• Researching, recording and retaining all information related to the object.

• Documenting

• Preservation• Keep it in a perfect/unaltered condition.

• Preserving the integrity and authenticity.

• Restoration• Return something to an earlier condition

• Reconstruction• Forming again, with improvements or removal of defects

7Conservation of Scientific Workflow Infrastructures by Using Semantics

Page 8: Conservation of Scientific Workflow Infrastructures by Using Semantics - 2012

Terminology

8Conservation of Scientific Workflow Infrastructu by Using Semantics

Inspired by [Goble, 2012]

Page 9: Conservation of Scientific Workflow Infrastructures by Using Semantics - 2012

Terminology

9Conservation of Scientific Workflow Infrastructures by Using Semantics

Preservation

Inspired by [Goble, 2012]

Page 10: Conservation of Scientific Workflow Infrastructures by Using Semantics - 2012

Terminology

10Conservation of Scientific Workflow Infrastructures by Using Semantics

Preservation

Conserva

tion

Inspired by [Goble, 2012]

Page 11: Conservation of Scientific Workflow Infrastructures by Using Semantics - 2012

Terminology

11Conservation of Scientific Workflow Infrastructures by Using Semantics

PreservationRestoratio

n

Conserva

tion

Inspired by [Goble, 2012]

Page 12: Conservation of Scientific Workflow Infrastructures by Using Semantics - 2012

Terminology

12Conservation of Scientific Workflow Infrastructures by Using Semantics

PreservationRestoratio

n

Conserva

tion Reconstruction

Inspired by [Goble, 2012]

Page 13: Conservation of Scientific Workflow Infrastructures by Using Semantics - 2012

State of the Art

• WF preservation• Reproducible Research System [Mesirov,2012]

• Reproducible Research Environment• Reproducible Research Publisher

• Research Object [Page, 2012]• Provenance: W3C PROV Model

• WF & Virtualization/Cloud• VGrADS• Tavaxy (Taverna+Galaxy)• Pegasus on EC2 [Juve,2009]• Wrangler [Juve,2011]

• WF & semantics• OWL-S & OWL-WS [Beco, 2005]• Wings [Gil, 2011]• RO semantics: LOD, VoID, OAI-ORE, AO/OAC, SIOC,

OPM/PROV, Memento…

13Conservation of Scientific Workflow Infrastructures by Using Semantics

Page 14: Conservation of Scientific Workflow Infrastructures by Using Semantics - 2012

State of the Art

• Projects• Commit-nl: e-infrastructure virtualization for e-science

applications (RDF for describing infrastructures)• SHIWA & ER-flow

• Software preservation• Brian Matthews et. al. [Matthews, 2010] [Matthews, 2011]• VM for legacy applications

• WF & virtualization• SHARE: a web portal for creating and sharing executable

research papers [Van Gorp, 2011]• SIGMOD Repeatability Challenge: “Use a virtual machine

(VM) as the environment for experiments.” [Bonnet,2011]• External services?• Who Preserves?

14Conservation of Scientific Workflow Infrastructures by Using Semantics

Page 15: Conservation of Scientific Workflow Infrastructures by Using Semantics - 2012

Open Issues

• Infrastructure conservation and reproducibility• Infrastructure is a predefined element of the WF• It is not described as part of the WF specification (briefly

mentioned in the papers)

• Models and methods for describing hardware and software requirements of a WF• Current approaches for WF conservation/preservation take

into account only the WF definition (method) and the data used and produced, but not the infrastructure

• In vitro WFs provide an exhaustive definition of the equipment and resources used in each experiment.

15Conservation of Scientific Workflow Infrastructures by Using Semantics

Page 16: Conservation of Scientific Workflow Infrastructures by Using Semantics - 2012

Goals• Minimum information model to sufficiently describe an

infrastructure in order to reproduce it in the future• Models for representing:

• Execution requirements of a workflow• Infrastructure dependencies

• Framework to provide:• Means for populating these models

• Using collaborative annotation (e.g. scientists and systems staff)

• Automated or semi automated analysis of traces and profiles to extract the requirements

• Algorithms that are able to generete an equivalent infrastructure specification • Based on the MIM• Infrastructure providers resource availability• User policies

16Conservation of Scientific Workflow Infrastructures by Using Semantics

Page 17: Conservation of Scientific Workflow Infrastructures by Using Semantics - 2012

Approach

17Conservation of Scientific Workflow Infrastructures by Using Semantics

Page 18: Conservation of Scientific Workflow Infrastructures by Using Semantics - 2012

Approach

1. Several kind of processes using different tools and with different requirements.

2. Virtualization techniques allow to create flexible, dynamic and on-demand infrastructures.

3. Create VMs fulfilling the WF hardware requirements and, if necessary, dynamically deploy the required software stack.

18Conservation of Scientific Workflow Infrastructures by Using Semantics

Page 19: Conservation of Scientific Workflow Infrastructures by Using Semantics - 2012

Approach

1. Integrate several virtual infrastructure providers.

2. Scheduler1. Define the machines to be created

2. Software stack to be deployed on them and its configuration

3. Information System.

19Conservation of Scientific Workflow Infrastructures by Using Semantics

Page 20: Conservation of Scientific Workflow Infrastructures by Using Semantics - 2012

Approach

1. Framework for handling all the information.

2. Ontology repository: VM, cloud, MIM, etc.

3. Cataloging the available resources; VM images and applications.

4. Capturing knowledge from different user roles.

20Conservation of Scientific Workflow Infrastructures by Using Semantics

Page 21: Conservation of Scientific Workflow Infrastructures by Using Semantics - 2012

References (I)• [De Roure, 2011] Towards the Preservation of Scientific Workflows, David De Roure, Khalid

Belhajjam, Paolo Missier, José Manuel Gómez-Pérez, Raúl Palma, José Enrique Ruiz, Kristina Hettne, Marco Roos, Graham Klyne, Carole GobleIn: iPRES 2011 - 8th International Conference on Preservation of Digital Objects; 01 Nov 2011-04 Nov 2011; Singapore. 2011.

• [Timmer, 2012] Timmer, J., Scientific reproducibility, for fun and profit, 2012: http://arstechnica.com/science/2012/08/scientificreproducibility-for-fun-and-profit/

• [Drummond, 2009] Drummond, C. 2009. Replicability is not reproducibility: nor is it good science. Proc. Eval. Methods Mach. Learn. Workshop 26th ICML, Montreal, Quebec, Canada.

• [Goble, 2012] The Reality of Reproducibility of Computational Science, Carole Goble, eScience 2012 Keynote: http://www.ci.uchicago.edu/escience2012/pdf/The_Reality_of_Reproducability_in_Computational_Science.pdf

• [Mesirov,2012] Accessible Reproducible Research, Jill P. Mesirov,Science 22 January 2010: 327 (5964), 415-416.

• [Page, 2012] From workflows to Research Objects: an architecture for preserving the semantics of science. Page K, Palma R, Holubowicz P, Klyne G, Soiland-Reyes S, Cruickshank D, González-Cabero R et al. Linked Science, 2012

• [PROV Model] In progres: http://dvcs.w3.org/hg/prov/raw-file/tip/primer/Primer.html• [Juve,2009] Juve, G.; Deelman, E.; Vahi, K.; Mehta, G.; Berriman, B.; Berman, B.P.; Maechling, P.; ,

"Scientific workflow applications on Amazon EC2," E-Science Workshops, 2009 5th IEEE International Conference on , vol., no., pp.59-66, 9-11 Dec. 2009

• [Juve,2011] Gideon Juve and Ewa Deelman. 2011. Automating Application Deployment in Infrastructure Clouds. In Proceedings of the 2011 IEEE Third International Conference on Cloud Computing Technology and Science (CLOUDCOM '11)

21Conservation of Scientific Workflow Infrastructures by Using Semantics

Page 22: Conservation of Scientific Workflow Infrastructures by Using Semantics - 2012

References (II)

22Conservation of Scientific Workflow Infrastructures by Using Semantics

• [Gil, 2011] Gil, Y.; Ratnakar, V.; Jihie Kim; Moody, J.; Deelman, E.; González-Calero, P.A.; Groth, P.; , "Wings: Intelligent Workflow-Based Design of Computational Experiments," Intelligent Systems, IEEE , vol.26, no.1, pp.62-72, Jan.-Feb. 2011

• [OWL-S] http://www.w3.org/Submission/OWL-S/• [Beco, 2005] Beco, S, Cantalupo, B, Giammarino, L, Matskanis, N and Surridge, M (2005) OWL-WS:

A Workflow Ontology for Dynamic Grid Service Composition. In, Proceedings of 1st IEEE International Conference on e-Science and Grid Computing. 1st IEEE International Conference on e-Science and Grid Computing

• [Matthews, 2010] B. Matthews et al. A framework for software preservation. International Journal of Digital Curation, 5(1), 2010.

• [Matthews, 2011] http://www.cdlib.org/services/uc3/iPres/presentations/ConwaySoftware.pdf• [Van Gorp, 2011] Pieter Van Gorp, Steffen Mazanek: SHARE: a web portal for creating and sharing

executable research papers. Procedia CS 4: 589-597 (2011)• [Bonnet,2011] Philippe Bonnet, Stefan Manegold, Matias Bjørling, Wei Cao, Javier Gonzalez, Joel

Granados, Nancy Hall, Stratos Idreos, Milena Ivanova, Ryan Johnson, David Koop, Tim Kraska, René Müller, Dan Olteanu, Paolo Papotti, Christine Reilly, Dimitris Tsirogiannis, Cong Yu, Juliana Freire, and Dennis Shasha. 2011. Repeatability and workability evaluation of SIGMOD 2011. SIGMOD Rec. 40, 2 (September 2011)

• [Slezák,2011] Slezák, P. & Waczulíková, I. (2011). Reproducibility and Repeatability. Physiology Research, 60, 203-205

Page 23: Conservation of Scientific Workflow Infrastructures by Using Semantics - 2012

…end

23Conservation of Scientific Workflow Infrastructures by Using Semantics

Page 24: Conservation of Scientific Workflow Infrastructures by Using Semantics - 2012

Conservation of Scientific Workflow Infrastructures by

Using Semantics

December 2012

Idafen Santana-Pérez

[email protected]

Ontology Engineering Group

Universidad Politécnica de Madrid

Madrid, Spain