Upload
reginald-bennett-williams
View
213
Download
0
Embed Size (px)
Citation preview
Recording application executions enriched with domain semantics
of computations and data
Master of Science Thesis
Michał Pelczar
Krakow, 30.9.2008
Outline
• Background• Objectives• Provenance model• Information building• Feasibility study• QUaTRO• State of the art• Research outline• Publications
Background
• E-Science– Advanced computing technologies supporting
scientists– Global collaboration in key areas of science
• Semantic Web provides data scalability– XML, RDF, RDFS, OWL– Ontology serves as taxonomy
• Grid computing provides computation scalability• Virtual experiments influence scientific
discoveries pace
Provenance
• metadata that pertains to the derivation history of a data product starting from its original sources
• the seven W’s: Who, What, Where, Why, When, Which, hoW
• Scientific results reproducibility• Guarantee of data reliability and quality• Regulatory mechanism of sensitive data
protection• Mean of e ciency optimizationffi
ViroLab
• Virtual laboratory for infectious diseases • Prevention, diagnosis and treatment • Medical science, computer science, healthcare
Objectives
• Design information model for provenance• Design data model for monitoring system• Adapt existing monitoring infrastructure to the
provenance requirements• Define ontology creation process
– Ontology and data model independent– Manageable– Augmentable– Described semantically
• Design and implement component realizing the process• Incorporate the component into system grid
infrastructure• Design and implement provenance querying component
Provenance model
• Experiment re-execution• Data dependencies• Results management• Performance• Resources availability• Related with ontologies:
– Data– Domain
Ontology extension
• Derivation concepts– XML– Delegates
• Aggregation rules• Annotations
– Classes– Properties
Information building
• OWL and XSD independent• Manageable• Events correlation• Events aggregation• Experiment transaction support• Knowledge history tracking• Association strategy
Proof of concept:Drug resistance case study
• Alignment• Subtyping• Drug ranking• Different levels of semantics
– Data– Computation
QUaTRO
• Abstract query language– Data representation and storage transparent– Understandable by non-IT specialist– Configurable by ontologies– Easy to integrate with GUI– Extendible
Query processing
NewDrugRanking
RulSet
Matthew Brown
2007-06-28
HIVDB
4.2.8
executedBy
dateOfBloodSample
usedRuleSet
name
version
//*[local-name() eq 'NewDrugRanking' and ( (child::*[ name() =
'executedBy' and . eq 'MrHyde']))]
//*[local-name() eq 'NewDrugRanking' and ( (child::*[ name() =
'dateOfBloodSample' and . eq '2007-06-28']))]
SELECT id FROM rulesets WHERE name = ‘HIVDB’
SELECT id FROM rulesets WHERE version = ‘4.2.8’//*[local-name() eq 'RuleSet' and (
(child::*[ name() = 'vl-data-protos:dasId' and . eq
'cyfronet_mysql:test:id:2']))]
//*[local-name() eq 'NewDrugRanking' and (child::*[
name() = 'usedRuleSet' and (@*[name()='rdf:resource' and ( ( . eq 'http://www.virolab.org/onto/drs-protos/HIVDB_4_2_7' )) ]) ])]
• Provenance ontologies• Mapping ontologies• File systems• Databases• Operators
Summary
• Data model for operations and resources• Ontologies for data, experiments and geno2drs
scenario• Monitoring infrastructure: remote logging,
automatic generation of helpers• Semantic Event Aggregator implemented and
deployed as OneJAR application• QUaTRO integrated into GridSphere portal
Future work
• QUaTRO extensions– Join operation– Provenance graph rendering– File system querying
• Model extensions– Performance recording– Data origin recording
• Explicit provenance recording– Domain ontologies generation– Partial results storage– Domain events publication
Publications
• B. Balis, M. Bubak, M. Pelczar, From Monitoring Data to Experiment Information – Monitoring of Grid Scientific Workflows. In G. Fox, K. Chiu, and R. Buyya, editors, Third IEEE International Conference on e-Science and Grid Computing, e-Science 2007, Bangalore, India, 10-13 December 2007, pages 187-194. IEEE Computer Society, 2007.
• B. Balis, M. Bubak, M. Pelczar, J. Wach, Provenance Tracking and Querying in ViroLab. In Cracow GridWorkshop 2007Workshop Proceedings, pp.71-76, ACC CYFRONET AGH 2008.
• B. Balis, M. Bubak, M. Pelczar, J. Wach, Provenance Querying for End-Users: A Drug Resistance Case Study. In: Bubak, M., Albada, G.D.v., Dongarra, J., Sloot, P.M.A. (Eds.), Proceedings ICCS 2008, Krakoland, June 23-25, 2008, LNCS 5103, pp. 80-89, Springer 2008.
Detailed information
• ViroLab: http://www.virolab.org
• VLvl: http://www.virolab.cyfronet.pl
http://grid.cyfronet.pl/virolab/wiki
• QUaTRO: http://virolab.cyfronet.pl/trac/quatro
• Ontologies: http://virolab.cyfronet.pl/onto