45
The open source ISA metadata tracking framework: from data curation and management at the source, to the linked data universe BOSC, Long Beach, July 13-14, 2012 Philippe Rocca-Serra (Ph. D) ISA Team twitter: @isatools.org [email protected] http://www.isa-tools.org 1 Friday, 13 July 2012

P Rocca-Serra - The open source ISA metadata tracking framework: from data curation and management at the source, to the linked data universe

Embed Size (px)

DESCRIPTION

Presentation at BOSC2012 by P Rocca-Serra - The open source ISA metadata tracking framework: from data curation and management at the source, to the linked data universe

Citation preview

Page 1: P Rocca-Serra - The open source ISA metadata tracking framework: from data curation and management at the source, to the linked data universe

The open source ISA metadata tracking framework: from data curation and management at

the source, to the linked data universe

BOSC, Long Beach, July 13-14, 2012

Philippe Rocca-Serra (Ph. D)

ISA Team

twitter : @isatools.org

[email protected]://www.isa-tools.org

1

Friday, 13 July 2012

Page 2: P Rocca-Serra - The open source ISA metadata tracking framework: from data curation and management at the source, to the linked data universe

MAIN THEME:It is all about structuring experimental information to make it available to computer and software agents to enable mining.

But let’s proceed gradually…

3

Friday, 13 July 2012

Page 3: P Rocca-Serra - The open source ISA metadata tracking framework: from data curation and management at the source, to the linked data universe

MAIN THEME:It is all about structuring experimental information to make it available to computer and software agents to enable mining.

But let’s proceed gradually…

Notes in Lab Books(information for humans)

3

Friday, 13 July 2012

Page 4: P Rocca-Serra - The open source ISA metadata tracking framework: from data curation and management at the source, to the linked data universe

MAIN THEME:It is all about structuring experimental information to make it available to computer and software agents to enable mining.

But let’s proceed gradually…

Notes in Lab Books(information for humans)

Spreadsheets and Tables( the compromise)

3

Friday, 13 July 2012

Page 5: P Rocca-Serra - The open source ISA metadata tracking framework: from data curation and management at the source, to the linked data universe

MAIN THEME:It is all about structuring experimental information to make it available to computer and software agents to enable mining.

But let’s proceed gradually…

Notes in Lab Books(information for humans)

Spreadsheets and Tables( the compromise)

Facts as RDF statements(information for machines)

3

Friday, 13 July 2012

Page 6: P Rocca-Serra - The open source ISA metadata tracking framework: from data curation and management at the source, to the linked data universe

Observations

• Experiments are expensive, often publicly funded, still many fail to see the light.

• Spreadsheets are the most common vehicle for so-called ‘omics’ (functional genomics) experimental metadata tracking

• technology centric repositories form de facto silos• conversions are required to allow for deposition to public

databases.• submitting to common information across a series of

repositories is inefficient

9

Friday, 13 July 2012

Page 7: P Rocca-Serra - The open source ISA metadata tracking framework: from data curation and management at the source, to the linked data universe

Case Study

10

Friday, 13 July 2012

Page 8: P Rocca-Serra - The open source ISA metadata tracking framework: from data curation and management at the source, to the linked data universe

Many ontologies, Many Formats, Many Requirements…

Grr…Where are the tools!?!

Credits:  h/p://liverpoolsolfed.wordpress.com/resources/image-­‐bank/demonstraAon/

13

Friday, 13 July 2012

Page 9: P Rocca-Serra - The open source ISA metadata tracking framework: from data curation and management at the source, to the linked data universe

ISA framework overview14

Friday, 13 July 2012

Page 10: P Rocca-Serra - The open source ISA metadata tracking framework: from data curation and management at the source, to the linked data universe

Why ISA format and Tools?

–Supporting data provenance tracking–Node/Edge underlying concept–Tabular as a compromise: a presentation layer inspired by Object

model (FuGE,MAGE-OM)–A Generic representation, applied to:•microarray based experiments (MAGE)• sequencing based experiments (SRA)• flow cytometry based experiments (FuGE-Flow Cyt)•mass spectrometry and NMR spectroscopy experiments

Friday, 13 July 2012

Page 11: P Rocca-Serra - The open source ISA metadata tracking framework: from data curation and management at the source, to the linked data universe

Why ISA format and Tools?

investigation

assay(s) assay(s)

data data

external  files  in  native  or  other  for-­

mats

pointers  to  data  file  names/location

investigationhigh  level  concept  to  link  related  studies

studythe  central  unit,  containing  information  on  the  subject  under  study,  its  characteristics  and  any  treatments  applied.a  study  has  associated  assays

assaytest  performed  either  on  material  taken  from  the  sub-­ject  or  on  the  whole  initial  subject,  which  produce  quali-­tative  or  quantitative  meas-­urements  (data)

H. Sapiens

33 Years

H. Sapiens

H. Sapiens

H. Sapiens

H1

H1

H2

35

35

33

Years

Years

Years

H1.sample1

H1.sample2

H2.sample1

Labeling

Labeling

H1.sample1.labeled

H2.sample1.labeled

h1-s1.cel

h1-s2.cel

h2-s1.cel

H1

H2

H1.sample1

H1.sample2

H2.sample1

Labeling

Labeling

H1.sample1.labeled

H2.sample1.labeled

h1-s1.cel

h1-s2.cel

h2-s1.cel

H. Sapiens

35 Years

MAGE-Tab Pride-xml

SRA-xml

ISA metadata specifications:•workflow and process orientated•compatible with checklist enforcement•compatible with external vocabulary resources•compatible by design with existing schemas

Currently finalizing conversion to RDF to explore the growing Linked Data universe, in collaboration with the W3C HCLSIG, Toxbank Consortium)

Friday, 13 July 2012

Page 12: P Rocca-Serra - The open source ISA metadata tracking framework: from data curation and management at the source, to the linked data universe

ISA syntax and Table definition

• Material Transformations: – Input and Outputs of Protocols are Material Nodes (Source Name, Sample Name, Extract Name, Labeled Extract Name.)

Material Node Material Node

Protocol REF

Parameter Value […]

Characteristics[…]Factor Value[…] (independent variables)Material TypeComment[…]

Characteristics[…]Factor Value[…] (independent variables)Material TypeComment[…]

9

Date (day effect)

Performer (operator effect)

Friday, 13 July 2012

Page 13: P Rocca-Serra - The open source ISA metadata tracking framework: from data curation and management at the source, to the linked data universe

ISA syntax and Table definition

• Material Transformations: – Input and Outputs of Protocols are Material Nodes (Source Name, Sample Name, Extract Name, Labeled Extract Name.)

Material Node Material Node

Protocol REF

Parameter Value […]

Characteristics[…]Factor Value[…] (independent variables)Material TypeComment[…]

Characteristics[…]Factor Value[…] (independent variables)Material TypeComment[…]

9

Date (day effect)

Performer (operator effect)

Friday, 13 July 2012

Page 14: P Rocca-Serra - The open source ISA metadata tracking framework: from data curation and management at the source, to the linked data universe

ISA syntax and Table definition

• Material Transformations: – Input and Outputs of Protocols are Material Nodes (Source Name, Sample Name, Extract Name, Labeled Extract Name.)

Material Node Material Node

Protocol REF

Parameter Value […]

Characteristics[…]Factor Value[…] (independent variables)Material TypeComment[…]

Characteristics[…]Factor Value[…] (independent variables)Material TypeComment[…]

9

Date (day effect)

Performer (operator effect)

Data File Node

Friday, 13 July 2012

Page 15: P Rocca-Serra - The open source ISA metadata tracking framework: from data curation and management at the source, to the linked data universe

ISA syntax and Table definition

• Material Transformations: – Input and Outputs of Protocols are Material Nodes (Source Name, Sample Name, Extract Name, Labeled Extract Name.)

Material Node Material Node

Protocol REF

Parameter Value […]

Characteristics[…]Factor Value[…] (independent variables)Material TypeComment[…]

Characteristics[…]Factor Value[…] (independent variables)Material TypeComment[…]

9

Date (day effect)

Performer (operator effect)

Comment[…]

Data File Node

Friday, 13 July 2012

Page 16: P Rocca-Serra - The open source ISA metadata tracking framework: from data curation and management at the source, to the linked data universe

ISA syntax and Table definition

• Material Transformations: – Input and Outputs of Protocols are Material Nodes (Source Name, Sample Name, Extract Name, Labeled Extract Name.)

Material Node Material Node

Protocol REF

Parameter Value […]

Characteristics[…]Factor Value[…] (independent variables)Material TypeComment[…]

Characteristics[…]Factor Value[…] (independent variables)Material TypeComment[…]

9

Date (day effect)

Performer (operator effect)

Comment[…]

Data File Node

Friday, 13 July 2012

Page 17: P Rocca-Serra - The open source ISA metadata tracking framework: from data curation and management at the source, to the linked data universe

ISAconfigurator Tables19

Friday, 13 July 2012

Page 18: P Rocca-Serra - The open source ISA metadata tracking framework: from data curation and management at the source, to the linked data universe

ISAconfigurator Tables20

Friday, 13 July 2012

Page 19: P Rocca-Serra - The open source ISA metadata tracking framework: from data curation and management at the source, to the linked data universe

How do ISA tools access Ontology servers?

22

Friday, 13 July 2012

Page 20: P Rocca-Serra - The open source ISA metadata tracking framework: from data curation and management at the source, to the linked data universe

isacreator

Developed to be a user friendly way to enter standards-compliant metadata: it has lots of features...

But these are just some of them...we also have a data entry wizard and an import utility...

The ISAcreator...

Friday, 13 July 2012

Page 21: P Rocca-Serra - The open source ISA metadata tracking framework: from data curation and management at the source, to the linked data universe

Select and Annotate in ISAcreator24

Friday, 13 July 2012

Page 22: P Rocca-Serra - The open source ISA metadata tracking framework: from data curation and management at the source, to the linked data universe

Extending ISAcreatorThe Plugin Archictecture

Friday, 13 July 2012

Page 23: P Rocca-Serra - The open source ISA metadata tracking framework: from data curation and management at the source, to the linked data universe

Plugins in ISAcreator

•Plugins can be developed for 3 different purposes:

In ISAcreator, we use the Apache Felix implementation of the OSGi framework...it’s really good.

Search (adds extra search space for ontology tool)

Custom cell editors (for spreadsheet)

Extra general functionality (which appears in a plugin menu)

•2 Examples of ISA plugins:

• Access to local metadata stores: Novartis Plugin to Ontology Widget

• Annotation of findings: Metabolite Identification Plugin (Metabolights Repository contribution to ISA project).

Friday, 13 July 2012

Page 24: P Rocca-Serra - The open source ISA metadata tracking framework: from data curation and management at the source, to the linked data universe

Plugins...example 1 Novartis Metastore Search

Search function on the Novartis Metastore... integrates search results on the metastore in the Ontology search tool.

So, with the Novartis plugin in your Plugin directory, you’ll be able to search the Novartis metastore directly within ISAcreator, and it will handle all the tasks involved with recording term source, etc.

Friday, 13 July 2012

Page 25: P Rocca-Serra - The open source ISA metadata tracking framework: from data curation and management at the source, to the linked data universe

Plugins Example 2 - Metabolite Identification plugin

5 Credits: Kenneth Haug: Metabolights

Friday, 13 July 2012

Page 26: P Rocca-Serra - The open source ISA metadata tracking framework: from data curation and management at the source, to the linked data universe

Potential Issues and known hurdles

• The problem of conflicting versions–especially high when working with big consortia–distributed, decentralized groups of users

• Lack of version control and history

• Absence of collaborative features

–Looking for new solutions while retaining the features !•OntoMaton: Bringing Google Doc, NCBO Bioportal and ISA-TAB together !

30

Friday, 13 July 2012

Page 27: P Rocca-Serra - The open source ISA metadata tracking framework: from data curation and management at the source, to the linked data universe

Friday, 13 July 2012

Page 28: P Rocca-Serra - The open source ISA metadata tracking framework: from data curation and management at the source, to the linked data universe

OntoMaton: Searching

Friday, 13 July 2012

Page 29: P Rocca-Serra - The open source ISA metadata tracking framework: from data curation and management at the source, to the linked data universe

OntoMaton: Tagging

Friday, 13 July 2012

Page 30: P Rocca-Serra - The open source ISA metadata tracking framework: from data curation and management at the source, to the linked data universe

OntoMaton• Public release: http://goo.gl/2OKFV

• Can be used in any Google Spreadsheet document

• Application:

• Annotating data records

• Supporting ontology development (see OBI Quick Term Templates)

Friday, 13 July 2012

Page 31: P Rocca-Serra - The open source ISA metadata tracking framework: from data curation and management at the source, to the linked data universe

ISA2RDF work in progress

• Use case on W3C HCLS scientific discourse list–deciding on the granularity of representation–building on previous experience–Evaluating alternative representations.

• Participitation to the Biohackathon 2011–http://blogs.openaccesscentral.com/blogs/bmcblog/entry/

biohackathon_2011_number_1–Discussing best practices• PURL uri and identifiers.org as identifiers

• Openphacts guidelines (http://www.nanopub.org/guidelines/OpenPHACTS_Nanopublication_Guidlines_v1.8.1.pdf)

31

Friday, 13 July 2012

Page 32: P Rocca-Serra - The open source ISA metadata tracking framework: from data curation and management at the source, to the linked data universe

Preparing for Linked Open Data

✴ ISA2RDF (Toxbank collaboration) contribution to an ecosystem of software tools supporting the ISA syntax

✴ reliance to internet resolvable identifiers

✴ W3C bio/life science Note on Gene Expression RDF - (PMID: 22449719)

✴ TODO:

✴ Specify comparator groups + analysis methods and resulting measurements and statistical measures

Friday, 13 July 2012

Page 33: P Rocca-Serra - The open source ISA metadata tracking framework: from data curation and management at the source, to the linked data universe

Preparing for Linked Open Data

✴ ISA2RDF (Toxbank collaboration) contribution to an ecosystem of software tools supporting the ISA syntax

✴ reliance to internet resolvable identifiers

✴ W3C bio/life science Note on Gene Expression RDF - (PMID: 22449719)

✴ TODO:

✴ Specify comparator groups + analysis methods and resulting measurements and statistical measures

Friday, 13 July 2012

Page 34: P Rocca-Serra - The open source ISA metadata tracking framework: from data curation and management at the source, to the linked data universe

Preparing for Linked Open Data

✴ ISA2RDF (Toxbank collaboration) contribution to an ecosystem of software tools supporting the ISA syntax

✴ reliance to internet resolvable identifiers

✴ W3C bio/life science Note on Gene Expression RDF - (PMID: 22449719)

✴ TODO:

✴ Specify comparator groups + analysis methods and resulting measurements and statistical measures

Friday, 13 July 2012

Page 35: P Rocca-Serra - The open source ISA metadata tracking framework: from data curation and management at the source, to the linked data universe

ISA2RDF: work in progress

32

jeliazkova.nina [toxbank project]

Friday, 13 July 2012

Page 36: P Rocca-Serra - The open source ISA metadata tracking framework: from data curation and management at the source, to the linked data universe

ISA2RDF: work in progress

32

jeliazkova.nina [toxbank project]

Friday, 13 July 2012

Page 37: P Rocca-Serra - The open source ISA metadata tracking framework: from data curation and management at the source, to the linked data universe

ISA2OWL

• OWLAPI

• ISA Parser (in memory BII object store objects)

• Mapping ISA syntax into target Ontological Space

• Decoupling Mapping from Conversion Engine

• avoid to be tied to a semantic framework

Friday, 13 July 2012

Page 38: P Rocca-Serra - The open source ISA metadata tracking framework: from data curation and management at the source, to the linked data universe

ISA2OWL: mapping in the BFO space as starting point

Friday, 13 July 2012

Page 39: P Rocca-Serra - The open source ISA metadata tracking framework: from data curation and management at the source, to the linked data universe

ISA2OWL: mapping in the BFO space as starting point

Friday, 13 July 2012

Page 40: P Rocca-Serra - The open source ISA metadata tracking framework: from data curation and management at the source, to the linked data universe

ISA2OWL: mapping issues

• Stability over time

• Keeping track of resource versions

• Gaps in coverage

• Use of local extensions

• Direct requests/contributions

Friday, 13 July 2012

Page 41: P Rocca-Serra - The open source ISA metadata tracking framework: from data curation and management at the source, to the linked data universe

ISA2OWL: development

• include graph metadata (graph provenance to aid indexing)

• extend semantic validation of ISA archive

• augment annotation by suggesting additions

• facilitate curation work

• create new mappings to other frameworks(OPML model, SIO,)

Friday, 13 July 2012

Page 42: P Rocca-Serra - The open source ISA metadata tracking framework: from data curation and management at the source, to the linked data universe

Publication...

ISA software suite: supporting standards-compliant experimental annotation and enabling curation at the community level

Philippe Rocca-Serra; Marco Brandizi; Eamonn Maguire; Nataliya Sklyar ; Chris Taylor ; Kimberly Begley; Dawn Field; Stephen Harris; Winston Hide; Oliver Hofmann; Steffen Neumann; Peter Sterk; Weida Tong; Susanna-Assunta SansoneBioinformaAcs  2010  26:  2354-­‐2356

33

Friday, 13 July 2012

Page 43: P Rocca-Serra - The open source ISA metadata tracking framework: from data curation and management at the source, to the linked data universe

Groups and individuals participating in:MIBBI http://mibbi.org ISA-­‐Tab  format http://isatab.sf.netOBO  Foundry http://obofoundry.orgOBI: http://obi-ontology.org/page/Main_Page

ISA Infrastructure Team:Alejandra Gonzalez-­‐Beltran  (Oxford)Eamonn Maguire  (Oxford)Philippe Rocca-­‐Serra  (Oxford)

Acknowledgements

collaborators at:Cambridge University

EuNuGOHarvard School for Public Health

FDAs NCTRLeibniz Plant Institute

NERCs NEBCSIDR,  INIST

Metabolights,  EMBL-­‐EBI

Funders:EU Carcinogenomics Project

UK  BBSRC

34

Friday, 13 July 2012

Page 44: P Rocca-Serra - The open source ISA metadata tracking framework: from data curation and management at the source, to the linked data universe

Groups and individuals participating in:Winston Hide: HSPHOliver Hoffman: HSPHShannan Ho Sui : HSPHBrad Chapman: HSPHChristoph Steinbeck: MetabolightsKenneth Haug: MetabolightsPaula de Matos: MetabolightsMagali Roux: INISTFlorian Mazur: INISTAlain Zasadzinki: INISTMarie Christine Jacquemot: INISTNina Jeliazkova: ToxBank

And many more who have to forgive us!

35

Friday, 13 July 2012

Page 45: P Rocca-Serra - The open source ISA metadata tracking framework: from data curation and management at the source, to the linked data universe

Questions:

36

Friday, 13 July 2012