From Scientific Workflow Patterns to 5-star Linked Open DataFrom Scientific Workflow Patterns to...

From Scientific Workflow Patterns to 5-star Linked Open Data

Alban Gaignard Hala Skaf-Molli Audrey Bihouée

Nantes Academic Hospital (CHU de

Nantes), CNRS, France

LINA, Nantes University,

CNRS, France

Institut du Thorax, Nantes University,

INSERM, CNRS, France

8th USENIX workshop on Theory and Practice of Provenance (TaPP’16)Washington DC

Needs for linked experiment reports

TaPP ‘16A. Gaignard - H. Skaf-Molli - A. Bihouée

Motivations: reusing (massive) RNA-seq data

TopHat: algorithm to align multiple sequence reads to a reference

genome (known genes).

1 sample

Input data 2 x 17 Gb

1-core CPU 170 hours

32-cores CPU 32 hours

Output data 12 Gb

1 sample 300 samples

Input data 2 x 17 Gb 10.2 Tb

1-core CPU 170 hours 5.9 years

32-cores CPU 32 hours 14 months

Output data 12 Gb 3.6 Tb

1 sample

Output data 12 Gb

Challenges

Algorithmic performance, storage, preservation,

reuse (limit recompute) & share.

1 sample

Output data 12 Gb

Motivations: reusing experiment results

Scientific experiment: RNA sequencing to quantify gene expression

levels under multiple biological conditions.

Motivations: reusing experiment results

Scientific experiment: RNA sequencing to quantify gene expression

levels under multiple biological conditions.

Need for scientific context : metadata

Expected result: human+machine tractable reports

Annotated “Material & Methods”

Links to some workflow artifacts (algorithms, data)

5-star Linked Open Data

W3C standards for machine and human

readable data on the web.

⭑⭑⭑⭑⭑ : time and expertise !

5-star Linked Open Data

How to ease this process ?

● Workflow engines → automation

● PROV → workflow runs as linked data

W3C standards for machine and human

readable data on the web.

⭑⭑⭑⭑⭑ : time and expertise !

PROV only

too fine-grained

no domain concepts

Provenance as a Linked Experiment Report

few + meaningful statements

Problem statement & objectives

Problem statement

Scientific workflows produce massive raw results. Their

publication into curated query-able linked data repositories

requires lot of time and expertise.

Can we exploit provenance traces to ease the publication of

scientific results as Linked Data ?

Problem statement & objectives

Problem statement

Scientific workflows produce massive raw results. Their

publication into curated query-able linked data repositories

requires lot of time and expertise.

Can we exploit provenance traces to ease the publication of

scientific results as Linked Data ?

Objectives

(1) Leverage annotated workflow patterns to generate

provenance mining rules.

(2) Refine provenance traces into linked experiment reports.

Rules generation

Approach

Input domain-specific annotations (❶,❷)

Workflow patterns ❶

Sequence patterns, with possibly intermediate steps

● P-PLAN ontology: Step, Variable, hasInputVar, hasOutputVar

● EDAM ontology: hasFunction, RNA sequence, Genome map

Input domain-specific annotations (❶,❷)

Workflow patterns ❶

Sequence patterns, with possibly intermediate steps

● P-PLAN ontology: Step, Variable, hasInputVar, hasOutputVar

● EDAM ontology: hasFunction, RNA sequence, Genome map

Experiment report template ❷

Link scientific claims, statements, material and methods

● MicroPublication ontology: Material, Method, Claims

● Experimental factor ontology: Transcriptome, Gene expression

● NCBI taxonomy: Homo Sapiens

● Open Annotation model: hasBody, hasTarget

PoeM: generating PrOvEnance Mining rules ❸

(SPARQL Property path)(SPARQL Basic graph pattern)

(SPARQL Construct query)

PoeM: sample generated rule ❸

<If> part

<Then> part

First experiments & results

Experiment context

SyMeTRIC: systems medicine project (2015-2017, call “Connect Talent”), funded by the french region Pays de la Loire.

Experiment

Material & methods

● Real-life RNA-seq workflow to study 3 mice populations

● WF implemented in Galaxy, run on 2 biological samples

● PROV traces exported from Galaxy Histories (API)

Experiment

Material & methods

● Real-life RNA-seq workflow to study 3 mice populations

● WF implemented in Galaxy, run on 2 biological samples

● PROV traces exported from Galaxy Histories (API)

Results (for 1 biological sample)

● 60h CPU (12 cores for genome alignment), 21Gb storage

● 3s to export 81 PROV triples from the Galaxy history

● 2s to apply the rule and produce 35 Micropublication triples

Conclusion & perspectives

Semi-automated approach

(1) PoeM generates semantic web rules

(2) PoeM rules applied on PROV traces to assemble

linked experiment reports (MicroPublication)

Limitations:

- Sequence workflow patterns only

- SPARQL property paths with complex WF patterns ?

- Syntactic matching between WF patterns and PROV labels

Usage scenarios:

→ Query workflow datasets with domain concepts

→ Populate RDF repositories with WF results

Conclusion & perspectives

Future works

(1) WF patterns: split-merge, “common motifs”

(2) Genericity: other domains / other reports (RO, Nanopub.)

(3) PROV heterogeneity: multi-systems PROV

(4) Evaluation: involving biologists, at larger scale

Questions ?Demo: http://poem.univ-nantes.fr

Contact: alban.gaignard@univ-nantes.fr

Acknowledgments

BiRD bioinformatics facility Connect Talent Call

TaPP’16A. Gaignard, H. Skaf-Molli, A. Bihouée 30

Larger scale experiments (PROV traces)

1232 edges

TaPP’16A. Gaignard, H. Skaf-Molli, A. Bihouée 31

Larger scale experiments (PROV traces)

49 edges

From Scientific Workflow Patterns to 5-star Linked Open DataFrom Scientific Workflow Patterns to...

Documents

Modular perfusion workflow - Thermo Fisher Scientificassets.thermofisher.com/.../modular-perfusion-workflow-brochure.pdf · 6 Workflow explanation A continuous processing workflow

HUMAN RESOURCES ADMINISTRATION GUIDEhrdocs.hamilton.ca/PeopleSoft User Guides Linked Within...0 HUMAN RESOURCES ADMINISTRATION GUIDE AUTOMATED WORKFLOW & MANAGER SELF SERVICE This

Workflow workflow tools and design of workflow

Workflow Experience Simplified - Axisvationaxisvation.com/images/Workflow Experience Simplified.pdf · Workflow engine. SharePoint 2013 Workflows run on the Azure Workflow engine,

ARCHESTRA WORKFLOW ENABLES ORGANIZATIONAL ......ARCHESTRA WORKFLOW ENABLES ORGANIZATIONAL BEST PRACTICES ArchestrA Workflow Software is a comprehensive workflow solution for industrial

Sequence Editor Interface Sequence Processing Workflow Workflow Engine Workflow Manager Interface

iLastic: Linked Data Generation Workflow and User Interface for iMinds Scholarly Data

SERUM « ABSIT REVERENTIA VERO · 2019. 11. 1. · Cyril PEDROSA Tekeningen / Kleur Nicolas gAigNARD SERUM serum_1_binnenwerk_print.indd 1 23-09-19 12:22. Eerste druk oktober 2019

N-Glycan Analysis: Better Together · Better Together. Agilent and ProZyme sample preparation workflow. 2 The structure of N-linked glycans can affect the immunogenicity, pharmacokinetics,

Prepress Workflow Automation - Crisp Digital Prepress Workflow... · Prepress Workflow Automation • Web-Based Workflow Management • Automated Pre-flighting • ROOM Workflow Integrity

Version 1.0 1 Oracle Workflow. Version 1.0 2 Agenda Overview Workflow Builder – Workflow Components – Create Workflow Process – Item Type - “Standard”

WorkFlow Designer - help.genesys.com · WorkFlow Designer Printable Help . iii . WorkFlow Engine considerations .....20

Workflow API and workflow services

Plockmatic Production Booklet maker sYstem · The Plockmatic Production Booklet Maker system is a fully integrated system featuring automated workflow and job recovery that is linked

Workflow Administrator's Guide - download.microsoft.comdownload.microsoft.com/.../WorkflowAdministration.pdf · 2 WORKFLOW ADMINISTRATOR’S GUIDE Introduction Welcome to Workflow,

Workflow Highlightswebasset.uxceclipse.com/wp-content/uploads/2016/04/... · Dynamics NAV Process Workflow Workflow Task / Decision Workflow Task / Decision Workflow Task / Decision

Re-use in data-driven sciences: from provenance to (linked ... › data › pages › rda_France_Al… · from provenance to (linked) data summaries Alban Gaignard, PhD, CNRS Paris,

By, Nasheet Ahmed Siddiqui. Agenda Workflow overview Workflow development Query for a Workflow Workflow category Workflow type Workflow elements Enable

Workflow and Approvals Guide - Document Logistix … · Workflow Overview Workflow and Approvals Guide 2 Workflow Overview Document Manager Workflow is a feature that assists in the

Standard Workflow Processeslearning.employeeconnect.com/.../Standard-Workflow... · Standard Workflow Processes A reference document designed to help individuals use the Workflow