Upload
jan-aerts
View
106
Download
2
Embed Size (px)
DESCRIPTION
Presentation at BOSC2012 by P Rocca-Serra - The open source ISA metadata tracking framework: from data curation and management at the source, to the linked data universe
Citation preview
The open source ISA metadata tracking framework: from data curation and management at
the source, to the linked data universe
BOSC, Long Beach, July 13-14, 2012
Philippe Rocca-Serra (Ph. D)
ISA Team
twitter : @isatools.org
[email protected]://www.isa-tools.org
1
Friday, 13 July 2012
MAIN THEME:It is all about structuring experimental information to make it available to computer and software agents to enable mining.
But let’s proceed gradually…
3
Friday, 13 July 2012
MAIN THEME:It is all about structuring experimental information to make it available to computer and software agents to enable mining.
But let’s proceed gradually…
Notes in Lab Books(information for humans)
3
Friday, 13 July 2012
MAIN THEME:It is all about structuring experimental information to make it available to computer and software agents to enable mining.
But let’s proceed gradually…
Notes in Lab Books(information for humans)
Spreadsheets and Tables( the compromise)
3
Friday, 13 July 2012
MAIN THEME:It is all about structuring experimental information to make it available to computer and software agents to enable mining.
But let’s proceed gradually…
Notes in Lab Books(information for humans)
Spreadsheets and Tables( the compromise)
Facts as RDF statements(information for machines)
3
Friday, 13 July 2012
Observations
• Experiments are expensive, often publicly funded, still many fail to see the light.
• Spreadsheets are the most common vehicle for so-called ‘omics’ (functional genomics) experimental metadata tracking
• technology centric repositories form de facto silos• conversions are required to allow for deposition to public
databases.• submitting to common information across a series of
repositories is inefficient
9
Friday, 13 July 2012
Case Study
10
Friday, 13 July 2012
Many ontologies, Many Formats, Many Requirements…
Grr…Where are the tools!?!
Credits: h/p://liverpoolsolfed.wordpress.com/resources/image-‐bank/demonstraAon/
13
Friday, 13 July 2012
ISA framework overview14
Friday, 13 July 2012
Why ISA format and Tools?
–Supporting data provenance tracking–Node/Edge underlying concept–Tabular as a compromise: a presentation layer inspired by Object
model (FuGE,MAGE-OM)–A Generic representation, applied to:•microarray based experiments (MAGE)• sequencing based experiments (SRA)• flow cytometry based experiments (FuGE-Flow Cyt)•mass spectrometry and NMR spectroscopy experiments
Friday, 13 July 2012
Why ISA format and Tools?
investigation
assay(s) assay(s)
data data
external files in native or other for-
mats
pointers to data file names/location
investigationhigh level concept to link related studies
studythe central unit, containing information on the subject under study, its characteristics and any treatments applied.a study has associated assays
assaytest performed either on material taken from the sub-ject or on the whole initial subject, which produce quali-tative or quantitative meas-urements (data)
H. Sapiens
33 Years
H. Sapiens
H. Sapiens
H. Sapiens
H1
H1
H2
35
35
33
Years
Years
Years
H1.sample1
H1.sample2
H2.sample1
Labeling
Labeling
H1.sample1.labeled
H2.sample1.labeled
h1-s1.cel
h1-s2.cel
h2-s1.cel
H1
H2
H1.sample1
H1.sample2
H2.sample1
Labeling
Labeling
H1.sample1.labeled
H2.sample1.labeled
h1-s1.cel
h1-s2.cel
h2-s1.cel
H. Sapiens
35 Years
MAGE-Tab Pride-xml
SRA-xml
ISA metadata specifications:•workflow and process orientated•compatible with checklist enforcement•compatible with external vocabulary resources•compatible by design with existing schemas
Currently finalizing conversion to RDF to explore the growing Linked Data universe, in collaboration with the W3C HCLSIG, Toxbank Consortium)
Friday, 13 July 2012
ISA syntax and Table definition
• Material Transformations: – Input and Outputs of Protocols are Material Nodes (Source Name, Sample Name, Extract Name, Labeled Extract Name.)
Material Node Material Node
Protocol REF
Parameter Value […]
Characteristics[…]Factor Value[…] (independent variables)Material TypeComment[…]
Characteristics[…]Factor Value[…] (independent variables)Material TypeComment[…]
9
Date (day effect)
Performer (operator effect)
Friday, 13 July 2012
ISA syntax and Table definition
• Material Transformations: – Input and Outputs of Protocols are Material Nodes (Source Name, Sample Name, Extract Name, Labeled Extract Name.)
Material Node Material Node
Protocol REF
Parameter Value […]
Characteristics[…]Factor Value[…] (independent variables)Material TypeComment[…]
Characteristics[…]Factor Value[…] (independent variables)Material TypeComment[…]
9
Date (day effect)
Performer (operator effect)
Friday, 13 July 2012
ISA syntax and Table definition
• Material Transformations: – Input and Outputs of Protocols are Material Nodes (Source Name, Sample Name, Extract Name, Labeled Extract Name.)
Material Node Material Node
Protocol REF
Parameter Value […]
Characteristics[…]Factor Value[…] (independent variables)Material TypeComment[…]
Characteristics[…]Factor Value[…] (independent variables)Material TypeComment[…]
9
Date (day effect)
Performer (operator effect)
Data File Node
Friday, 13 July 2012
ISA syntax and Table definition
• Material Transformations: – Input and Outputs of Protocols are Material Nodes (Source Name, Sample Name, Extract Name, Labeled Extract Name.)
Material Node Material Node
Protocol REF
Parameter Value […]
Characteristics[…]Factor Value[…] (independent variables)Material TypeComment[…]
Characteristics[…]Factor Value[…] (independent variables)Material TypeComment[…]
9
Date (day effect)
Performer (operator effect)
Comment[…]
Data File Node
Friday, 13 July 2012
ISA syntax and Table definition
• Material Transformations: – Input and Outputs of Protocols are Material Nodes (Source Name, Sample Name, Extract Name, Labeled Extract Name.)
Material Node Material Node
Protocol REF
Parameter Value […]
Characteristics[…]Factor Value[…] (independent variables)Material TypeComment[…]
Characteristics[…]Factor Value[…] (independent variables)Material TypeComment[…]
9
Date (day effect)
Performer (operator effect)
Comment[…]
Data File Node
Friday, 13 July 2012
ISAconfigurator Tables19
Friday, 13 July 2012
ISAconfigurator Tables20
Friday, 13 July 2012
How do ISA tools access Ontology servers?
22
Friday, 13 July 2012
isacreator
Developed to be a user friendly way to enter standards-compliant metadata: it has lots of features...
But these are just some of them...we also have a data entry wizard and an import utility...
The ISAcreator...
Friday, 13 July 2012
Select and Annotate in ISAcreator24
Friday, 13 July 2012
Extending ISAcreatorThe Plugin Archictecture
Friday, 13 July 2012
Plugins in ISAcreator
•Plugins can be developed for 3 different purposes:
In ISAcreator, we use the Apache Felix implementation of the OSGi framework...it’s really good.
Search (adds extra search space for ontology tool)
Custom cell editors (for spreadsheet)
Extra general functionality (which appears in a plugin menu)
•2 Examples of ISA plugins:
• Access to local metadata stores: Novartis Plugin to Ontology Widget
• Annotation of findings: Metabolite Identification Plugin (Metabolights Repository contribution to ISA project).
Friday, 13 July 2012
Plugins...example 1 Novartis Metastore Search
Search function on the Novartis Metastore... integrates search results on the metastore in the Ontology search tool.
So, with the Novartis plugin in your Plugin directory, you’ll be able to search the Novartis metastore directly within ISAcreator, and it will handle all the tasks involved with recording term source, etc.
Friday, 13 July 2012
Plugins Example 2 - Metabolite Identification plugin
5 Credits: Kenneth Haug: Metabolights
Friday, 13 July 2012
Potential Issues and known hurdles
• The problem of conflicting versions–especially high when working with big consortia–distributed, decentralized groups of users
• Lack of version control and history
• Absence of collaborative features
–Looking for new solutions while retaining the features !•OntoMaton: Bringing Google Doc, NCBO Bioportal and ISA-TAB together !
30
Friday, 13 July 2012
Friday, 13 July 2012
OntoMaton: Searching
Friday, 13 July 2012
OntoMaton: Tagging
Friday, 13 July 2012
OntoMaton• Public release: http://goo.gl/2OKFV
• Can be used in any Google Spreadsheet document
• Application:
• Annotating data records
• Supporting ontology development (see OBI Quick Term Templates)
Friday, 13 July 2012
ISA2RDF work in progress
• Use case on W3C HCLS scientific discourse list–deciding on the granularity of representation–building on previous experience–Evaluating alternative representations.
• Participitation to the Biohackathon 2011–http://blogs.openaccesscentral.com/blogs/bmcblog/entry/
biohackathon_2011_number_1–Discussing best practices• PURL uri and identifiers.org as identifiers
• Openphacts guidelines (http://www.nanopub.org/guidelines/OpenPHACTS_Nanopublication_Guidlines_v1.8.1.pdf)
•
31
Friday, 13 July 2012
Preparing for Linked Open Data
✴ ISA2RDF (Toxbank collaboration) contribution to an ecosystem of software tools supporting the ISA syntax
✴ reliance to internet resolvable identifiers
✴ W3C bio/life science Note on Gene Expression RDF - (PMID: 22449719)
✴ TODO:
✴ Specify comparator groups + analysis methods and resulting measurements and statistical measures
Friday, 13 July 2012
Preparing for Linked Open Data
✴ ISA2RDF (Toxbank collaboration) contribution to an ecosystem of software tools supporting the ISA syntax
✴ reliance to internet resolvable identifiers
✴ W3C bio/life science Note on Gene Expression RDF - (PMID: 22449719)
✴ TODO:
✴ Specify comparator groups + analysis methods and resulting measurements and statistical measures
Friday, 13 July 2012
Preparing for Linked Open Data
✴ ISA2RDF (Toxbank collaboration) contribution to an ecosystem of software tools supporting the ISA syntax
✴ reliance to internet resolvable identifiers
✴ W3C bio/life science Note on Gene Expression RDF - (PMID: 22449719)
✴ TODO:
✴ Specify comparator groups + analysis methods and resulting measurements and statistical measures
Friday, 13 July 2012
ISA2RDF: work in progress
32
jeliazkova.nina [toxbank project]
Friday, 13 July 2012
ISA2RDF: work in progress
32
jeliazkova.nina [toxbank project]
Friday, 13 July 2012
ISA2OWL
• OWLAPI
• ISA Parser (in memory BII object store objects)
• Mapping ISA syntax into target Ontological Space
• Decoupling Mapping from Conversion Engine
• avoid to be tied to a semantic framework
Friday, 13 July 2012
ISA2OWL: mapping in the BFO space as starting point
Friday, 13 July 2012
ISA2OWL: mapping in the BFO space as starting point
Friday, 13 July 2012
ISA2OWL: mapping issues
• Stability over time
• Keeping track of resource versions
• Gaps in coverage
• Use of local extensions
• Direct requests/contributions
Friday, 13 July 2012
ISA2OWL: development
• include graph metadata (graph provenance to aid indexing)
• extend semantic validation of ISA archive
• augment annotation by suggesting additions
• facilitate curation work
• create new mappings to other frameworks(OPML model, SIO,)
Friday, 13 July 2012
Publication...
ISA software suite: supporting standards-compliant experimental annotation and enabling curation at the community level
Philippe Rocca-Serra; Marco Brandizi; Eamonn Maguire; Nataliya Sklyar ; Chris Taylor ; Kimberly Begley; Dawn Field; Stephen Harris; Winston Hide; Oliver Hofmann; Steffen Neumann; Peter Sterk; Weida Tong; Susanna-Assunta SansoneBioinformaAcs 2010 26: 2354-‐2356
33
Friday, 13 July 2012
Groups and individuals participating in:MIBBI http://mibbi.org ISA-‐Tab format http://isatab.sf.netOBO Foundry http://obofoundry.orgOBI: http://obi-ontology.org/page/Main_Page
ISA Infrastructure Team:Alejandra Gonzalez-‐Beltran (Oxford)Eamonn Maguire (Oxford)Philippe Rocca-‐Serra (Oxford)
Acknowledgements
collaborators at:Cambridge University
EuNuGOHarvard School for Public Health
FDAs NCTRLeibniz Plant Institute
NERCs NEBCSIDR, INIST
Metabolights, EMBL-‐EBI
Funders:EU Carcinogenomics Project
UK BBSRC
34
Friday, 13 July 2012
Groups and individuals participating in:Winston Hide: HSPHOliver Hoffman: HSPHShannan Ho Sui : HSPHBrad Chapman: HSPHChristoph Steinbeck: MetabolightsKenneth Haug: MetabolightsPaula de Matos: MetabolightsMagali Roux: INISTFlorian Mazur: INISTAlain Zasadzinki: INISTMarie Christine Jacquemot: INISTNina Jeliazkova: ToxBank
And many more who have to forgive us!
35
Friday, 13 July 2012
Questions:
36
Friday, 13 July 2012