Conservation of Scientific Workflow Infrastructures by Using Semantics - 2012

  • View
    133

  • Download
    2

Embed Size (px)

Text of Conservation of Scientific Workflow Infrastructures by Using Semantics - 2012

1. Conservation of Scientific WorkflowInfrastructures by Using SemanticsDecember 2012Idafen Santana-Prezisantana@fi.upm.esOntology Engineering GroupUniversidad Politcnica de MadridMadrid, Spain 2. Index Introduction Overview Terminology SoA Open Issues Goals ApproachConservation of Scientific Workflow Infrastructures by Using Semantics 2 3. Overview Experiments in empirical science Primary component of the scientific method Main method for validating a hypothesis Repeatable procedure Scientific publications Announce a result Convince readers that the result is correct Computational science In silico science Computational scientific workflow: a precise, executabledescription of a scientific procedure [De Roure, 2011] Reproducibility in principle underpins the scientificmethod [Goble,2012] Its about capturing, preserving, reusing and curating[Goble 2012]Conservation of Scientific Workflow Infrastructures by Using Semantics 3 4. Overview Experiment components: Data Method Apparatus/EquipmentClassic Experiments Components Computational ExperimentsMeasurements, real life items,individuals, etc.DataConservation of Scientific Workflow Infrastructures by Using Semantics 4Files, DBs, Web Servicesproviding data, etc.Step-by-step processdescription, in vivo/vitro WFs.Method Computational WFBunsen burners, Petri dishes,microscopes, etc.EquipmentInfrastructures: software tools,hardware resources, WSs, etc. 5. Overview Experiment components: Data Method Apparatus/EquipmentClassic Experiments Components Computational ExperimentsMeasurements, real life items,individuals, etc.DataConservation of Scientific Workflow Infrastructures by Using Semantics 5Files, DBs, Web Servicesproviding data, etc.Step-by-step processdescription, in vivo/vitro WFs.Method Computational WFBunsen burners, Petri dishes,microscopes, etc.EquipmentInfrastructures: software tools,hardware resources, WSs, etc. 6. Terminology Reproducibility the ability to replicate or reproduce experimental results. of a method/test can be defined as the closeness of theagreement between independent results obtained with the samemethod on the identical subject(s) (or object, or test material) butunder different conditions [Slezk,2011] Replicability: exact reproduction A lab will generally repeat an experiment several times and lookfor results before they get published. But, once that paper ispublished, people tend to look for reproducibility in other ways,testing the consequences of a finding, extending it to newcontexts or different populations. Almost nobody goes back andrepeats something that's already been published, though.[Timmer, 2012] Replicability is the poor cousin of reproducibility [Drummond,2009]Conservation of Scientific Workflow Infrastructures by Using Semantics 6 7. Terminology Conservation Action of prolonging the existence of significant objects. Researching, recording and retaining all information relatedto the object. Documenting Preservation Keep it in a perfect/unaltered condition. Preserving the integrity and authenticity. Restoration Return something to an earlier condition Reconstruction Forming again, with improvements or removal of defectsConservation of Scientific Workflow Infrastructures by Using Semantics 7 8. TerminologyConservation of Scientific Workflow Infrastructu by Using Semantics 8Inspired by [Goble, 2012] 9. TerminologyConservation of Scientific Workflow Infrastructures by Using Semantics 9Inspired by [Goble, 2012] 10. TerminologyConservation of Scientific Workflow Infrastructures by Using Semantics 10Inspired by [Goble, 2012] 11. TerminologyConservation of Scientific Workflow Infrastructures by Using Semantics 11Inspired by [Goble, 2012] 12. TerminologyConservation of Scientific Workflow Infrastructures by Using Semantics 12Inspired by [Goble, 2012] 13. State of the Art WF preservation Reproducible Research System [Mesirov,2012] Reproducible Research Environment Reproducible Research Publisher Research Object [Page, 2012] Provenance: W3C PROV Model WF & Virtualization/Cloud VGrADS Tavaxy (Taverna+Galaxy) Pegasus on EC2 [Juve,2009] Wrangler [Juve,2011] WF & semantics OWL-S & OWL-WS [Beco, 2005] Wings [Gil, 2011] RO semantics: LOD, VoID, OAI-ORE, AO/OAC, SIOC,OPM/PROV, MementoConservation of Scientific Workflow Infrastructures by Using Semantics 13 14. State of the Art Projects Commit-nl: e-infrastructure virtualization for e-scienceapplications (RDF for describing infrastructures) SHIWA & ER-flow Software preservation Brian Matthews et. al. [Matthews, 2010] [Matthews, 2011] VM for legacy applications WF & virtualization SHARE: a web portal for creating and sharing executableresearch papers [Van Gorp, 2011] SIGMOD Repeatability Challenge: Use a virtual machine(VM) as the environment for experiments. [Bonnet,2011] External services? Who Preserves?Conservation of Scientific Workflow Infrastructures by Using Semantics 14 15. Open Issues Infrastructure conservation and reproducibility Infrastructure is a predefined element of the WF It is not described as part of the WF specification (brieflymentioned in the papers) Models and methods for describing hardware andsoftware requirements of a WF Current approaches for WF conservation/preservation takeinto account only the WF definition (method) and the dataused and produced, but not the infrastructure In vitro WFs provide an exhaustive definition of theequipment and resources used in each experiment.Conservation of Scientific Workflow Infrastructures by Using Semantics 15 16. Goals Minimum information model to sufficiently describean infrastructure in order to reproduce it in the future Models for representing: Execution requirements of a workflow Infrastructure dependencies Framework to provide: Means for populating these models Using collaborative annotation (e.g. scientists andsystems staff) Automated or semi automated analysis of traces andprofiles to extract the requirements Algorithms that are able to generete an equivalentinfrastructure specification Based on the MIM Infrastructure providers resource availability User policiesConservation of Scientific Workflow Infrastructures by Using Semantics 16 17. ApproachConservation of Scientific Workflow Infrastructures by Using Semantics 17 18. Approach1. Several kind of processes using different tools andwith different requirements.2. Virtualization techniques allow to create flexible,dynamic and on-demand infrastructures.3. Create VMs fulfilling the WF hardware requirementsand, if necessary, dynamically deploy the requiredsoftware stack.Conservation of Scientific Workflow Infrastructures by Using Semantics 18 19. Approach1. Integrate several virtual infrastructure providers.2. Scheduler1. Define the machines to be created2. Software stack to be deployed on them and its configuration3. Information System.Conservation of Scientific Workflow Infrastructures by Using Semantics 19 20. Approach1. Framework for handling all the information.2. Ontology repository: VM, cloud, MIM, etc.3. Cataloging the available resources; VM images andapplications.4. Capturing knowledge from different user roles.Conservation of Scientific Workflow Infrastructures by Using Semantics 20 21. References (I) [De Roure, 2011] Towards the Preservation of Scientific Workflows, David De Roure, KhalidBelhajjam, Paolo Missier, Jos Manuel Gmez-Prez, Ral Palma, Jos Enrique Ruiz, Kristina Hettne,Marco Roos, Graham Klyne, Carole GobleIn: iPRES 2011 - 8th International Conference onPreservation of Digital Objects; 01 Nov 2011-04 Nov 2011; Singapore. 2011. [Timmer, 2012] Timmer, J., Scientific reproducibility, for fun and profit, 2012:http://arstechnica.com/science/2012/08/scientificreproducibility-for-fun-and-profit/ [Drummond, 2009] Drummond, C. 2009. Replicability is not reproducibility: nor is it good science. Proc.Eval. Methods Mach. Learn. Workshop 26th ICML, Montreal, Quebec, Canada. [Goble, 2012] The Reality of Reproducibility of Computational Science, Carole Goble, eScience 2012Keynote:http://www.ci.uchicago.edu/escience2012/pdf/The_Reality_of_Reproducability_in_Computational_Science.pdf [Mesirov,2012] Accessible Reproducible Research, Jill P. Mesirov,Science 22 January 2010: 327(5964), 415-416. [Page, 2012] From workflows to Research Objects: an architecture for preserving the semantics ofscience. Page K, Palma R, Holubowicz P, Klyne G, Soiland-Reyes S, Cruickshank D, Gonzlez-Cabero R et al. Linked Science, 2012 [PROV Model] In progres: http://dvcs.w3.org/hg/prov/raw-file/tip/primer/Primer.html [Juve,2009] Juve, G.; Deelman, E.; Vahi, K.; Mehta, G.; Berriman, B.; Berman, B.P.; Maechling, P.; ,"Scientific workflow applications on Amazon EC2," E-Science Workshops, 2009 5th IEEE InternationalConference on , vol., no., pp.59-66, 9-11 Dec. 2009 [Juve,2011] Gideon Juve and Ewa Deelman. 2011. Automating Application Deployment inInfrastructure Clouds. In Proceedings of the 2011 IEEE Third International Conference on CloudComputing Technology and Science (CLOUDCOM '11)Conservation of Scientific Workflow Infrastructures by Using Semantics 21 22. References (II) [Gil, 2011] Gil, Y.; Ratnakar, V.; Jihie Kim; Moody, J.; Deelman, E.; Gonzalez-Calero, P.A.; Groth, P.; ,"Wings: Intelligent Workflow-Based Design of Computational Experiments," Intelligent Systems, IEEE ,vol.26, no.1, pp.62-72, Jan.-Feb. 2011 [OWL-S] http://www.w3.org/Submission/OWL-S/ [Beco, 2005] Beco, S, Cantalupo, B, Giammarino, L, Matskanis, N and Surridge, M (2005) OWL-WS: AWorkflow Ontology for Dynamic Grid Service Composition. In, Proceedings of 1st IEEE InternationalConference on e-Science and Grid Computing. 1st IEEE International Conference on e-Science andGrid Computing [Matthews, 2010] B. Matthews et al. A framework for software preservation. International Journal ofDigital Curation, 5(1), 2010. [Matthews, 2011] http://www.cdlib.org/services/uc3/iPres/presentations/ConwaySoftware.pdf [Van Gorp, 2011] Pieter Van Gorp, Steffen