32
Wf4Ever: Annotating research objects Stian Soiland-Reyes, Sean BechHofer myGrid, University of Manchester Open Annotation Rollout, Manchester, 2013-06-24 This work is licensed under a Creative Commons Attribution 3.0 Unported License

2013 06-24 Wf4Ever: Annotating research objects (PDF)

Embed Size (px)

DESCRIPTION

Open Annotation Rollout, Manchester, 2013-06-25 See also PPTX version with Notes: http://www.slideshare.net/soilandreyes/2013-0624annotatingr-osopenannotationmeeting

Citation preview

Page 1: 2013 06-24 Wf4Ever: Annotating research objects (PDF)

Wf4Ever: Annotating research objectsStian Soiland-Reyes, Sean BechHofer

myGrid, University of Manchester

Open Annotation Rollout, Manchester, 2013-06-24This work is licensed under aCreative Commons Attribution 3.0 Unported License

Page 2: 2013 06-24 Wf4Ever: Annotating research objects (PDF)

Motivation: Scientific workflowsCoordinated execution of services and linked resources

Dataflow between servicesWeb services (SOAP, REST)

Command line tools

Scripts

User interactions

Components (nested workflows)

Method becomes:Documented visually

Shareable as single definition

Reusable with new inputs

Repurposable other services

Reproducible?

http://www.myexperiment.org/workflows/3355

http://www.taverna.org.uk/

http://www.biovel.eu/

Page 3: 2013 06-24 Wf4Ever: Annotating research objects (PDF)

But workflows are complex machines

Outputs

Inputs

Configuration

Components

http://www.myexperiment.org/workflows/3355

• Will it still work after a year? 10 years?

• Expanding components, we see a workflow involves a series of specific tools and services which

• Depend on datasets, software libraries, other tools

• Are often poorly described or understood

• Over time evolve, change, break or are replaced

• User interactions are not reproducible

• But can be tracked and replayed

Page 4: 2013 06-24 Wf4Ever: Annotating research objects (PDF)

Electronic Paper Not Enough

Hypothesis ExperimentResult

AnalysisConclusions

Investigation

Data Data

Electronic

paper

Publish

http://www.force11.org/beyondthepdf2http://figshare.com/

Open Research movement: Openly share the data of your experiments

http://datadryad.org/

Page 5: 2013 06-24 Wf4Ever: Annotating research objects (PDF)

http://www.researchobject.org/

RESEARCH OBJECT (RO)

http://www.researchobject.org/

Research objects goal: Openly share everything about your experiments, including how those things are related

Page 6: 2013 06-24 Wf4Ever: Annotating research objects (PDF)

What is in a research object?A Research Object bundles and relates digital resources of a scientific experiment or investigation:

Data used and results produced in experimental study

Methods employed to produce and analyse that data

Provenance and settings for the experiments

People involved in the investigation

Annotations about these resources, that are essential to the understanding and interpretation of the scientific outcomes captured by a research object

http://www.researchobject.org/

Page 7: 2013 06-24 Wf4Ever: Annotating research objects (PDF)

Gathering everythingResearch Objects (RO) aggregate related resources, their provenance and annotations

Conveys “everything you need to know” about a study/experiment/analysis/dataset/workflow

Shareable, evolvable, contributable, citable

ROs have their own provenance and lifecycles

Page 8: 2013 06-24 Wf4Ever: Annotating research objects (PDF)

Why Research Objects?i. To share your research materials

(RO as a social object)

ii. To facilitate reproducibility and reuse of methods

iii. To be recognized and cited(even for constituent resources)

iv. To preserve results and prevent decay (curation of workflow definition; using provenance for partial rerun)

Page 9: 2013 06-24 Wf4Ever: Annotating research objects (PDF)

A Research objecthttp://alpha.myexperiment.org/packs/387

Page 10: 2013 06-24 Wf4Ever: Annotating research objects (PDF)
Page 11: 2013 06-24 Wf4Ever: Annotating research objects (PDF)

Quality Assessment of a research object

Page 12: 2013 06-24 Wf4Ever: Annotating research objects (PDF)

Quality Monitoring

Page 13: 2013 06-24 Wf4Ever: Annotating research objects (PDF)

Annotations in research objectsTypes: “This document contains an hypothesis”

Relations: “These datasets are consumed by that tool”

Provenance: “These results came from this workflow run”

Descriptions: “Purpose of this step is to filter out invalid data”

Comments: “This method looks useful, but how do I install it?”

Examples: “This is how you could use it”

Page 14: 2013 06-24 Wf4Ever: Annotating research objects (PDF)

Annotation guidelines – which properties?Descriptions: dct:title, dct:description, rdfs:comment, dct:publisher, dct:license, dct:subject

Provenance: dct:created, dct:creator, dct:modified, pav:providedBy, pav:authoredBy, pav:contributedBy, roevo:wasArchivedBy, pav:createdAt

Provenance relations: prov:wasDerivedFrom, prov:wasRevisionOf, wfprov:usedInput, wfprov:wasOutputFrom

Social networking: oa:Tag, mediaont:hasRating, roterms:technicalContact, cito:isDocumentedBy, cito:isCitedBy

Dependencies: dcterms:requires, roterms:requiresHardware, roterms:requiresSoftware, roterms:requiresDataset

Typing: wfdesc:Workflow, wf4ever:Script, roterms:Hypothesis, roterms:Results, dct:BibliographicResource

Page 15: 2013 06-24 Wf4Ever: Annotating research objects (PDF)

What is provenance?

By Dr Stephen Dannlicensed under Creative Commons Attribution-ShareAlike 2.0 Generichttp://www.flickr.com/photos/stephendann/3375055368/

Attributionwho did it?

Derivationhow did it change?

Activitywhat happens to it?

Licensingcan I use it?

Attributeswhat is it?

Originwhere is it from?

Annotationswhat do others say about it?

Aggregationwhat is it part of?

Date and toolwhen was it made?using what?

Page 16: 2013 06-24 Wf4Ever: Annotating research objects (PDF)

AttributionWho collected this sample? Who helped?

Which lab performed the sequencing?

Who did the data analysis?

Who curated the results?

Who produced the raw data this analysis is based on?

Who wrote the analysis workflow?

Why do I need this?

i. To be recognized for my work

ii. Who should I give credits to?

iii. Who should I complain to?

iv. Can I trust them?

v. Who should I make friends with?

prov:wasAttributedToprov:actedOnBehalfOfdct:creatordct:publisherpav:authoredBypav:contributedBypav:curatedBypav:createdBypav:importedBypav:providedBy...

RolesPersonOrganizationSoftwareAgent

Agent types

AliceThe lab

Data

wasAttributedTo

actedOnBehalfOf

http://practicalprovenance.wordpress.com/

Page 17: 2013 06-24 Wf4Ever: Annotating research objects (PDF)

DerivationWhich sample was this metagenome sequenced from?

Which meta-genomes was this sequence extracted from?

Which sequence was the basis for the results?

What is the previous revision of the new results?

Why do I need this?

i. To verify consistency (did I usethe correct sequence?)

ii. To find the latest revision

iii. To backtrack where a diversionappeared after a change

iv. To credit work I depend on

v. Auditing and defence for peer review

wasDerivedFrom

wasQuotedFrom

Sequence

New results

wasDerivedFrom

Sample

Meta -genome

Old results

wasRevisionOf

wasInfluencedBy

Page 18: 2013 06-24 Wf4Ever: Annotating research objects (PDF)

Activities

What happened? When? Who?

What was used and generated?

Why was this workflow started?

Which workflow ran? Where?

Why do I need this?

i. To see which analysis was performed

ii. To find out who did what

iii. What was the metagenome used for?

iv. To understand the whole process“make me a Methods section”

v. To track down inconsistencies

used

wasGeneratedBy

wasStartedAt

"2012-06-21"

Metagenome

Sample

wasAssociatedWith

Workflow server

wasInformedBy

wasStartedBy

Workflow run

wasGeneratedBy

Results

Sequencing

wasAssociatedWith

Alice

hadPlan

Workflow definition

hadRole

Lab technician

Results

Page 19: 2013 06-24 Wf4Ever: Annotating research objects (PDF)

PROV model

http://www.w3.org/TR/prov-primer/

Copyright © 2013 W3C® (MIT, ERCIM, Keio, Beihang), All Rights Reserved.

Provenance Working Group

Page 20: 2013 06-24 Wf4Ever: Annotating research objects (PDF)

Provenance of what?

Who made the (content of) research object? Who maintains it?

Who wrote this document? Who uploaded it?

Which CSV was this Excel file imported from?

Who wrote this description? When? How did we get it?

What is the state of this RO? (Live or Published?)

What did the research object look like before? (Revisions) – are there newer versions?

Which research objects are derived from this RO?

Page 21: 2013 06-24 Wf4Ever: Annotating research objects (PDF)

Research object model at a glance

Research Object

ResourceResource

Resource

AnnotationAnnotation

Annotation

oa:hasTarget

ResourceResourceAnnotation graph

oa:hasBody

ore:aggregates

«ore:Aggregation»«ro:ResearchObject»

«oa:Annotation»«ro:AggregatedAnnotation»

«trig:Graph»

«ore:AggregatedResource»«ro:Resource»

Manifest

«ore:ResourceMap»«ro:Manifest»

Page 22: 2013 06-24 Wf4Ever: Annotating research objects (PDF)

Wf4Ever architecture

Blob storeGraphstore

Resource

Uploaded to

ManifestAnnotation

graphResearch

objectAnnotationORE Proxy

External reference

Redirects to

If RDF, import as named graph

SPARQL

REST resources

http://www.wf4ever-project.org/wiki/display/docs/RO+API+6

Page 23: 2013 06-24 Wf4Ever: Annotating research objects (PDF)

Where do RO annotations come from?

Imported from uploaded resources, e.g. embedded in workflow-specific format (creator: unknown!)

Created by users filling in Title, Description etc. on website

By automatically invoked software agents, e.g.:

A workflow transformation service extracts the workflow structure as RDF from the native workflow format

Provenance trace from a workflow run, which describes the origin of aggregated output files in the research object

Page 24: 2013 06-24 Wf4Ever: Annotating research objects (PDF)

How we are using the OA modelMultiple oa:Annotation contained within the manifest RDF and aggregated by the RO.

Provenance (PAV, PROV) on oa:Annotation (who made the link) and body resource (who stated it)

Typically a single oa:hasTarget, either the RO or an aggregated resource.

oa:hasBody to a trig:Graph resource (read: RDF file) with the “actual” annotation as RDF:

<workflow1> dct:title "The wonderful workflow" .

Multiple oa:hasTarget for relationships, e.g. graph body:<workflow1> roterms:inputSelected <input2.txt> .

Page 25: 2013 06-24 Wf4Ever: Annotating research objects (PDF)

What should we also be using?Motivations

myExperiment: commenting, describing, moderating, questioning, replying, tagging – made our own vocabulary as OA did not exist

Selectors on compound resourcesE.g. description on processors within a workflow in a workflow

definition. How do you find this if you only know the workflow definition file?Currently: Annotations on separate URIs for each component,

described in workflow structure graph, which is body of annotation targeting the workflow definition file

Importing/referring to annotations from other OA systems (how to discover those?)

Page 26: 2013 06-24 Wf4Ever: Annotating research objects (PDF)

What is the benefit of OA for us?

Existing vocabulary – no need for our project to try to specify and agree on our own way of tracking annotations.

Potential interoperability with third-party annotation tools

E.g. We want to annotate a figure in a paper and relate it to a dataset in a research object – don’t want to write another tool for that!

Existing annotations (pre research object) in Taverna and myExperiment map easily to OA model

Page 27: 2013 06-24 Wf4Ever: Annotating research objects (PDF)

History lesson (AO/OAC/OA)

When forming the Wf4Ever Research Object model, we found:

Open Annotation Collaboration (OAC)

Annotation Ontology (AO)

What was the difference?

Technically, for Wf4Ever’s purposes: They are equivalent

Political choice: AO – supported by Utopia (Manchester)

We encouraged the formation of W3C Open Annotation Community Group and a joint model

Next: Research Object model v0.2 and RO Bundle will use the OA model – since we only used 2 properties, mapping is 1:1

http://www.wf4ever-project.org/wiki/display/docs/2011-09-26+Annotation+model+considerations

Page 28: 2013 06-24 Wf4Ever: Annotating research objects (PDF)

Saving a research object: RO bundle

Single, transferrable research object

Self-contained snapshot

Which files in ZIP, which are URIs? (Up to user/application)

Regular ZIP file, explored and unpacked with standard tools

JSON manifest is programmatically accessible without RDF understanding

Works offline and in desktop applications – no REST API access required

Basis for RO-enabled file formats, e.g. Taverna run bundle

Exchanged with myExperiment and RO tools

Page 29: 2013 06-24 Wf4Ever: Annotating research objects (PDF)

Workflow Results Bundle

workflowrun.prov.ttl(RDF)

outputA.txt

outputC.jpg

outputB/

https://w3id.org/bundle

intermediates/

1.txt2.txt

3.txt

de/def2e58b-50e2-4949-9980-fd310166621a.txt

inputA.txtworkflow

URI references

attribution

executionenvironment

Aggregating in Research Object

ZIP folder structure (RO Bundle)

mimetype

application/vnd.wf4ever.robundle+zip

.ro/manifest.json

Page 30: 2013 06-24 Wf4Ever: Annotating research objects (PDF)

RO Bundle

What is aggregated? File In ZIP or external URI

Who made the RO? When?

Who?

External URIs placed in folders

Embedded annotation

External annotation, e.g. blogpost

JSON-LD context RDF

RO provenance

.ro/manifest.json

Format

Note: JSON "quotes" not shown above for brevity

http://json-ld.org/

http://orcid.org/

https://w3id.org/bundle

Page 31: 2013 06-24 Wf4Ever: Annotating research objects (PDF)

http://mayor2.dia.fi.upm.es/oeg-upm/files/dgarijo/motifAnalysisSite/

<h3 property="dc:title">Common Motifs in Scientific Workflows:<br>An Empirical Analysis</h3>

<body resource="http://www.oeg-upm.net/files/dgarijo/motifAnalysisSite/" typeOf="ore:Aggregation ro:ResearchObject">

Research Object as RDFahttp://www.oeg-upm.net/files/dgarijo/motifAnalysisSite/

<li><a property="ore:aggregates" href="t2_workflow_set_eSci2012.v.0.9_FGCS.xls"typeOf="ro:Resource">Analytics for Taverna workflows</a></li>

<li><a property="ore:aggregates" href="WfCatalogue-AdditionalWingsDomains.xlsx“typeOf="ro:Resource">Analytics for Wings workflows</a></li>

<span property="dc:creator prov:wasAttributedTo"resource="http://delicias.dia.fi.upm.es/members/DGarijo/#me"></span>

Page 32: 2013 06-24 Wf4Ever: Annotating research objects (PDF)

W3C community group for RO

http://www.w3.org/community/rosc/