Upload
idafen-santana-perez
View
139
Download
2
Tags:
Embed Size (px)
Citation preview
Conservation of Scientific Workflow Infrastructures by
Using Semantics
December 2012
Idafen Santana-Pérez
Ontology Engineering Group
Universidad Politécnica de Madrid
Madrid, Spain
Index
• Introduction• Overview• Terminology
• SoA• Open Issues • Goals• Approach
2Conservation of Scientific Workflow Infrastructures by Using Semantics
Overview
• Experiments in empirical science• Primary component of the scientific method• Main method for validating a hypothesis• Repeatable procedure
• Scientific publications• Announce a result• Convince readers that the result is correct
• Computational science• In silico science• Computational scientific workflow: “a precise, executable
description of a scientific procedure” [De Roure, 2011]
• “Reproducibility in principle underpins the scientific method” [Goble,2012]
• “Its about capturing, preserving, reusing and curating” [Goble 2012]
3Conservation of Scientific Workflow Infrastructures by Using Semantics
Overview
• Experiment components:• Data• Method• Apparatus/Equipment
4Conservation of Scientific Workflow Infrastructures by Using Semantics
Classic Experiments Components Computational Experiments
Measurements, real life items, individuals, etc.
DataFiles, DBs, Web Services providing data, etc.
Step-by-step process description, in vivo/vitro WFs.
Method Computational WF
Bunsen burners, Petri dishes, microscopes, etc.
EquipmentInfrastructures: software tools, hardware resources, WSs, etc.
Overview
• Experiment components:• Data• Method• Apparatus/Equipment
5Conservation of Scientific Workflow Infrastructures by Using Semantics
Classic Experiments Components Computational Experiments
Measurements, real life items, individuals, etc.
DataFiles, DBs, Web Services providing data, etc.
Step-by-step process description, in vivo/vitro WFs.
Method Computational WF
Bunsen burners, Petri dishes, microscopes, etc.
EquipmentInfrastructures: software tools, hardware resources, WSs, etc.
Terminology
• Reproducibility• the ability to replicate or reproduce experimental results.
• of a method/test can be defined as the closeness of the agreement between independent results obtained with the same method on the identical subject(s) (or object, or test material) but under different conditions [Slezák,2011]
• Replicability: exact reproduction• “A lab will generally repeat an experiment several times and look
for results before they get published. But, once that paper is published, people tend to look for reproducibility in other ways, testing the consequences of a finding, extending it to new contexts or different populations. Almost nobody goes back and repeats something that's already been published, though.” [Timmer, 2012]
• Replicability is the poor cousin of reproducibility [Drummond, 2009]
6Conservation of Scientific Workflow Infrastructures by Using Semantics
Terminology
• Conservation• Action of prolonging the existence of significant objects.
• Researching, recording and retaining all information related to the object.
• Documenting
• Preservation• Keep it in a perfect/unaltered condition.
• Preserving the integrity and authenticity.
• Restoration• Return something to an earlier condition
• Reconstruction• Forming again, with improvements or removal of defects
7Conservation of Scientific Workflow Infrastructures by Using Semantics
Terminology
8Conservation of Scientific Workflow Infrastructu by Using Semantics
Inspired by [Goble, 2012]
Terminology
9Conservation of Scientific Workflow Infrastructures by Using Semantics
Preservation
Inspired by [Goble, 2012]
Terminology
10Conservation of Scientific Workflow Infrastructures by Using Semantics
Preservation
Conserva
tion
Inspired by [Goble, 2012]
Terminology
11Conservation of Scientific Workflow Infrastructures by Using Semantics
PreservationRestoratio
n
Conserva
tion
Inspired by [Goble, 2012]
Terminology
12Conservation of Scientific Workflow Infrastructures by Using Semantics
PreservationRestoratio
n
Conserva
tion Reconstruction
Inspired by [Goble, 2012]
State of the Art
• WF preservation• Reproducible Research System [Mesirov,2012]
• Reproducible Research Environment• Reproducible Research Publisher
• Research Object [Page, 2012]• Provenance: W3C PROV Model
• WF & Virtualization/Cloud• VGrADS• Tavaxy (Taverna+Galaxy)• Pegasus on EC2 [Juve,2009]• Wrangler [Juve,2011]
• WF & semantics• OWL-S & OWL-WS [Beco, 2005]• Wings [Gil, 2011]• RO semantics: LOD, VoID, OAI-ORE, AO/OAC, SIOC,
OPM/PROV, Memento…
13Conservation of Scientific Workflow Infrastructures by Using Semantics
State of the Art
• Projects• Commit-nl: e-infrastructure virtualization for e-science
applications (RDF for describing infrastructures)• SHIWA & ER-flow
• Software preservation• Brian Matthews et. al. [Matthews, 2010] [Matthews, 2011]• VM for legacy applications
• WF & virtualization• SHARE: a web portal for creating and sharing executable
research papers [Van Gorp, 2011]• SIGMOD Repeatability Challenge: “Use a virtual machine
(VM) as the environment for experiments.” [Bonnet,2011]• External services?• Who Preserves?
14Conservation of Scientific Workflow Infrastructures by Using Semantics
Open Issues
• Infrastructure conservation and reproducibility• Infrastructure is a predefined element of the WF• It is not described as part of the WF specification (briefly
mentioned in the papers)
• Models and methods for describing hardware and software requirements of a WF• Current approaches for WF conservation/preservation take
into account only the WF definition (method) and the data used and produced, but not the infrastructure
• In vitro WFs provide an exhaustive definition of the equipment and resources used in each experiment.
15Conservation of Scientific Workflow Infrastructures by Using Semantics
Goals• Minimum information model to sufficiently describe an
infrastructure in order to reproduce it in the future• Models for representing:
• Execution requirements of a workflow• Infrastructure dependencies
• Framework to provide:• Means for populating these models
• Using collaborative annotation (e.g. scientists and systems staff)
• Automated or semi automated analysis of traces and profiles to extract the requirements
• Algorithms that are able to generete an equivalent infrastructure specification • Based on the MIM• Infrastructure providers resource availability• User policies
16Conservation of Scientific Workflow Infrastructures by Using Semantics
Approach
17Conservation of Scientific Workflow Infrastructures by Using Semantics
Approach
1. Several kind of processes using different tools and with different requirements.
2. Virtualization techniques allow to create flexible, dynamic and on-demand infrastructures.
3. Create VMs fulfilling the WF hardware requirements and, if necessary, dynamically deploy the required software stack.
18Conservation of Scientific Workflow Infrastructures by Using Semantics
Approach
1. Integrate several virtual infrastructure providers.
2. Scheduler1. Define the machines to be created
2. Software stack to be deployed on them and its configuration
3. Information System.
19Conservation of Scientific Workflow Infrastructures by Using Semantics
Approach
1. Framework for handling all the information.
2. Ontology repository: VM, cloud, MIM, etc.
3. Cataloging the available resources; VM images and applications.
4. Capturing knowledge from different user roles.
20Conservation of Scientific Workflow Infrastructures by Using Semantics
References (I)• [De Roure, 2011] Towards the Preservation of Scientific Workflows, David De Roure, Khalid
Belhajjam, Paolo Missier, José Manuel Gómez-Pérez, Raúl Palma, José Enrique Ruiz, Kristina Hettne, Marco Roos, Graham Klyne, Carole GobleIn: iPRES 2011 - 8th International Conference on Preservation of Digital Objects; 01 Nov 2011-04 Nov 2011; Singapore. 2011.
• [Timmer, 2012] Timmer, J., Scientific reproducibility, for fun and profit, 2012: http://arstechnica.com/science/2012/08/scientificreproducibility-for-fun-and-profit/
• [Drummond, 2009] Drummond, C. 2009. Replicability is not reproducibility: nor is it good science. Proc. Eval. Methods Mach. Learn. Workshop 26th ICML, Montreal, Quebec, Canada.
• [Goble, 2012] The Reality of Reproducibility of Computational Science, Carole Goble, eScience 2012 Keynote: http://www.ci.uchicago.edu/escience2012/pdf/The_Reality_of_Reproducability_in_Computational_Science.pdf
• [Mesirov,2012] Accessible Reproducible Research, Jill P. Mesirov,Science 22 January 2010: 327 (5964), 415-416.
• [Page, 2012] From workflows to Research Objects: an architecture for preserving the semantics of science. Page K, Palma R, Holubowicz P, Klyne G, Soiland-Reyes S, Cruickshank D, González-Cabero R et al. Linked Science, 2012
• [PROV Model] In progres: http://dvcs.w3.org/hg/prov/raw-file/tip/primer/Primer.html• [Juve,2009] Juve, G.; Deelman, E.; Vahi, K.; Mehta, G.; Berriman, B.; Berman, B.P.; Maechling, P.; ,
"Scientific workflow applications on Amazon EC2," E-Science Workshops, 2009 5th IEEE International Conference on , vol., no., pp.59-66, 9-11 Dec. 2009
• [Juve,2011] Gideon Juve and Ewa Deelman. 2011. Automating Application Deployment in Infrastructure Clouds. In Proceedings of the 2011 IEEE Third International Conference on Cloud Computing Technology and Science (CLOUDCOM '11)
21Conservation of Scientific Workflow Infrastructures by Using Semantics
References (II)
22Conservation of Scientific Workflow Infrastructures by Using Semantics
• [Gil, 2011] Gil, Y.; Ratnakar, V.; Jihie Kim; Moody, J.; Deelman, E.; González-Calero, P.A.; Groth, P.; , "Wings: Intelligent Workflow-Based Design of Computational Experiments," Intelligent Systems, IEEE , vol.26, no.1, pp.62-72, Jan.-Feb. 2011
• [OWL-S] http://www.w3.org/Submission/OWL-S/• [Beco, 2005] Beco, S, Cantalupo, B, Giammarino, L, Matskanis, N and Surridge, M (2005) OWL-WS:
A Workflow Ontology for Dynamic Grid Service Composition. In, Proceedings of 1st IEEE International Conference on e-Science and Grid Computing. 1st IEEE International Conference on e-Science and Grid Computing
• [Matthews, 2010] B. Matthews et al. A framework for software preservation. International Journal of Digital Curation, 5(1), 2010.
• [Matthews, 2011] http://www.cdlib.org/services/uc3/iPres/presentations/ConwaySoftware.pdf• [Van Gorp, 2011] Pieter Van Gorp, Steffen Mazanek: SHARE: a web portal for creating and sharing
executable research papers. Procedia CS 4: 589-597 (2011)• [Bonnet,2011] Philippe Bonnet, Stefan Manegold, Matias Bjørling, Wei Cao, Javier Gonzalez, Joel
Granados, Nancy Hall, Stratos Idreos, Milena Ivanova, Ryan Johnson, David Koop, Tim Kraska, René Müller, Dan Olteanu, Paolo Papotti, Christine Reilly, Dimitris Tsirogiannis, Cong Yu, Juliana Freire, and Dennis Shasha. 2011. Repeatability and workability evaluation of SIGMOD 2011. SIGMOD Rec. 40, 2 (September 2011)
• [Slezák,2011] Slezák, P. & Waczulíková, I. (2011). Reproducibility and Repeatability. Physiology Research, 60, 203-205
…end
23Conservation of Scientific Workflow Infrastructures by Using Semantics
Conservation of Scientific Workflow Infrastructures by
Using Semantics
December 2012
Idafen Santana-Pérez
Ontology Engineering Group
Universidad Politécnica de Madrid
Madrid, Spain