16
Provenance in the Dynamic, Collaborative New Science Dr Jun Zhao Department of Zoology University of Oxford [email protected]

2011 03-provenance-workshop-edingurgh

Embed Size (px)

DESCRIPTION

Linked Data + provenance requirements from #wf4ever is now online

Citation preview

Page 1: 2011 03-provenance-workshop-edingurgh

Provenance in the Dynamic, Collaborative New Science

Dr Jun ZhaoDepartment of Zoology

University of [email protected]

Page 2: 2011 03-provenance-workshop-edingurgh
Page 3: 2011 03-provenance-workshop-edingurgh
Page 4: 2011 03-provenance-workshop-edingurgh
Page 5: 2011 03-provenance-workshop-edingurgh

Technological infrastructure for the preservation and efficient retrieval and reuse of scientific workflows in a range of disciplines

Page 6: 2011 03-provenance-workshop-edingurgh

Packaging, preserving and publishing

Page 7: 2011 03-provenance-workshop-edingurgh

● Dealing with big amounts of tabular data

● A lot of small scripts to avoid creating blackbox process

● Local resource sharing, public access only after publication

● Data must be frequently updated from external data repositories

● Data updates must be tested before being executed

● Data must be locally stored with versioning

● “... we don't like to spread [the tasks] and lose controls who is doing what ...”

Astronomy Use Case: A Repeater's Story

Page 8: 2011 03-provenance-workshop-edingurgh

Research Objects● Aggregation – Pointers or literals of

internal and external content;● Identity –Equivalence, equality;● Metadata – A reusable object;● Lifecycle – Stages of development.

Impacts on available functionality;● Versioning – Recording changes;● Security – Access, authentication,

ownership, trust;● Graceful Degradation of

Understanding – Opaque RO domain content.

● Mixed stewardship● Provenance

● Of compound objects● Of evolutions● Of dynamic objects and static

objects

ROs are Content Aware Objects that bundle things together

http:/www.wf4ever-project.org

Page 9: 2011 03-provenance-workshop-edingurgh

Biology Use Case: A Reuser's Story

● Takes a set of genes from gene experiment results performed by others, as read in a scientific paper

● Perform 'dry' analysis to understand which genes and which biological processes were disturbed by which chemical compounds● basic affymetrix data processing

● statistical analysis to identify genes that are significantly differentially expressed under different conditions (with/without the compounds)

● find those pathways that are most prominent among the filtered genes

Page 10: 2011 03-provenance-workshop-edingurgh

Biology Use Case: A Reuser's Story

● Search for existing experiments from myExperiment (http://myexperiment.org)

● Challenge: Understand the workflow● Perform test runs with test data and his own data● Read others' logs● Read annotations to workflows

● Reuse scripts from colleagues and perform tests that his colleagues are familiar with

Page 11: 2011 03-provenance-workshop-edingurgh

How Can It be Supported?● A reference to the source of the data and the people to acknowledge for it.

● The initial hypothesis

● The conceptual workflow or a summary of the experiment plan

● References to workflows that were tested, with comments on their application for the user's use case

● The workflow of the user's, possibly with a backlog of previous versions that the user wishes to keep for reference (with notes and comments)

● The runs of the user's own workflow, results and the recorded steps that lead to the results, in some cases with comments for later reference (e.g. 'here I used parameter A, next time I may try B')

● The final hypothesis, with comments.

● A reference to the results of the workflow

● Design logs that record the user's considerations while making the workflow

● Run logs that record the user's considerations while running and interpreting the workflow

Page 12: 2011 03-provenance-workshop-edingurgh

Where is Linked Data?

Page 13: 2011 03-provenance-workshop-edingurgh

The Role of Linked Data in Wf4Ever

● Collaborative science● Dynamic science● Open science

Page 14: 2011 03-provenance-workshop-edingurgh

Provenance Challenge

● Identity● Context● Storage● Retrieval

Page 15: 2011 03-provenance-workshop-edingurgh

Take home

● Provenance should be user-driven● Linked Data should be a means to an end● http://www.wf4ever-project.org

Page 16: 2011 03-provenance-workshop-edingurgh

Acknowledgement

● Marco Roos of Leiden Unveristy (NL) and Jose Enrique Ruiz of Instituto de Astrofísica de Andalucía (Spain)

● Carole Goble of University of Manchester (UK) and Jose Manuel Gomez of iSOCO (Spain)

● Hui Hua and Jenny Molly of University of Oxford (UK)