20
SysMO-DB and ISA Katy Wolstencroft, University of Manchester, UK

SysMO-DB and ISA Katy Wolstencroft, University of Manchester, UK

Embed Size (px)

Citation preview

Page 1: SysMO-DB and ISA Katy Wolstencroft, University of Manchester, UK

SysMO-DB and ISA

Katy Wolstencroft, University of Manchester, UK

Page 2: SysMO-DB and ISA Katy Wolstencroft, University of Manchester, UK

Data Exchange in SysMO Public data sources

model organism databases – (e.g. SGD)

BRENDA …. Data produced by SysMO

SABIO-RK, iChiP, MeMo …. Local databases & Files

Excel Spreadsheets The most common form of

experimental data format.

Proteomics

Met

adat

a

Metabolomics

Microarray

Proteomics

Single Cell Data

Variable descriptions of dataLittle adoption of community controlled vocabulary terms

Page 3: SysMO-DB and ISA Katy Wolstencroft, University of Manchester, UK

Challenges..…

Enable data to be easily exchanged & integrated Preserving project autonomy Working with existing resources

Wikis; CMS - Alfresco, eGroupWare,MediaWiki; Databases- BASE, maxD; Files and Spreadsheets.

Falling in with common work practices Exploiting existing resources in the community

Page 4: SysMO-DB and ISA Katy Wolstencroft, University of Manchester, UK

COSMIC

BaCell-SysMO

SysMOLab

MOSES

Alfresco

Alfresco

Wiki

Wiki

ANOTHER

A DATASTORE

Extracting Data

Page 5: SysMO-DB and ISA Katy Wolstencroft, University of Manchester, UK

JERM

JERM “Just Enough Results Model” Minimum information to exchange data

What type of data is it Microarray, growth curve, enzyme activity…

What was measured Gene expression, OD, metabolite concentration….

What do the values in the datasets mean Units, time series, repeats….

Which experiment does it relate to How was the data created

SOPs and protocols

Page 6: SysMO-DB and ISA Katy Wolstencroft, University of Manchester, UK

The Idea

For each data type….. Transcriptomics Proteomics Metabolomics Single Cell Data

Generate and apply…. JERM template JERM extractor for data host Subset registered in SEEK Access / export through JERM interface / template

Define a JERM….. Top down analysis of standards Bottom up analysis of practice

1

2

3

ISA-TAB

Page 7: SysMO-DB and ISA Katy Wolstencroft, University of Manchester, UK

For publishing

JERM data needs to be related to SOPs, experimental context and other data

JERM must be “MIBBI” compliant for exporting to public repositories e.g. Microarray data needs to be MIAME compliant

Page 8: SysMO-DB and ISA Katy Wolstencroft, University of Manchester, UK

CIMR Core Information for Metabolomics ReportingMIABE Minimal Information About a Bioactive Entity MIACA Minimal Information About a Cellular Assay MIAME Minimum Information About a Microarray Experiment MIAME/Env MIAME / Environmental transcriptomic experiment MIAME/Nutr MIAME / Nutrigenomics MIAME/Plant MIAME / Plant transcriptomics MIAME/Tox MIAME / Toxicogenomics MIAPA Minimum Information About a Phylogenetic Analysis MIAPAR Minimum Information About a Protein Affinity Reagent MIAPE Minimum Information About a Proteomics Experiment MIARE Minimum Information About a RNAi Experiment MIASE Minimum Information About a Simulation Experiment MIENS Minimum Information about an ENvironmental Sequence MIFlowCyt Minimum Information for a Flow Cytometry Experiment MIGen Minimum Information about a Genotyping Experiment MIGS Minimum Information about a Genome Sequence MIMIx Minimum Information about a Molecular Interaction Experiment MIMPP Minimal Information for Mouse Phenotyping Procedures MINI Minimum Information about a Neuroscience Investigation MINIMESS Minimal Metagenome Sequence Analysis Standard MINSEQE Minimum Information about a high-throughput SeQuencing Experiment MIPFE Minimal Information for Protein Functional Evaluation MIQAS Minimal Information for QTLs and Association Studies MIqPCR Minimum Information about a quantitative Polymerase Chain Reaction experimentMIRIAM Minimal Information Required In the Annotation of biochemical Models MISFISHIE Minimum Information Specification For In Situ Hybridization and Immunohistochemistry

ExperimentsSTRENDA Standards for Reporting Enzymology DataTBC Tox Biology Checklist

BioPAX : Biological Pathways Exchange http://www.biopax.org/FuGE Functional Genomics Experiment MGED: Microarray Experimental Conditions

http://www.mibbi.org/index.php/MIBBI_portal

Minimum Information Models

Page 9: SysMO-DB and ISA Katy Wolstencroft, University of Manchester, UK

Investigation Title Invasive vs. non-invasive strains of yeast

Experimental Design individual_genetic_characteristics_design growth_condition_design

Experimental Factor Name EF_Genotype EF_GrowthCond

Experimental Factor Type genotype growth_condition

Person Last Name Falstaff Shakespeare

Person First Name John Bill

Person Roles submitter;investigator investigator

Experiment Description An experiment was performed to...

Protocol Name Yeast Growth RNA extraction

Protocol Type grow nucleic_acid_extraction

Protocol Description S. cerevisiae cultures were grown on...

Total cellular RNA was extracted...

Protocol Parameters carbon source;temperature

SDRF File my_sdrf_file.txt

Page 10: SysMO-DB and ISA Katy Wolstencroft, University of Manchester, UK

ISA-TAB

Relating data and its experimental context Investigation, Study, Assay

TAB = tabular A format suitable for spreadsheets

Page 11: SysMO-DB and ISA Katy Wolstencroft, University of Manchester, UK

“assists in the reporting and local management of experimental metadata (i.e. sample characteristics, technologies used, type of measurements) from studies employing one or a combination of technologies

facilitates submission to international public repositories of genomics, transcriptomics and proteomics studies”

Originally developed for multiple ‘omics data

Page 12: SysMO-DB and ISA Katy Wolstencroft, University of Manchester, UK

ArrayExpress Pride

Existing production systems

Transcriptomics data files +

required experimental descriptors

Proteomics data files +

required experimental descriptors

HUPO-PSI

standards

MGED

standards

Mage TAB ProteomeHarvest

MIAMExpress

Mage-ML PSI-XML(s)

Current situation @ EBINO common

representation

of complex studies

Independent databases,

different metadata representation, format,

diverse terminologies etc.

STO

RA

GE

SU

BM

ISS

IO

NR

ETR

IEV

AL

Page 13: SysMO-DB and ISA Katy Wolstencroft, University of Manchester, UK

ISA Provides....

A common framework for describing how your data relates to its experimental context

A common framework for relating different types of data

Page 14: SysMO-DB and ISA Katy Wolstencroft, University of Manchester, UK

ISA Provides

Cross walking between the Omics data stores Relating microarrays and proteomics etc if they

are part of the same study Providing a single mechanism for submission to

multiple data silos

Page 15: SysMO-DB and ISA Katy Wolstencroft, University of Manchester, UK

ISA Defined

Investigation: high level description of the area and the main aims of a project

Study: a particular biological hypothesis or analysis

Assay: specific, individual experiments required to be undertaken together in order to address the study hypotheses

Page 16: SysMO-DB and ISA Katy Wolstencroft, University of Manchester, UK

ISA in SysMO

Investigation: main aims of SysMO projects Analysis of Central Carbon Metabolism of Sulfolobus

solfataricus under varying temperatures Study: a collection of experiments designed to answer a

particular biological question Comparison of S. solfataricus grown at 70 and 80 degrees

Assay: individual experiments in the study Comparison of transcriptome 70 and 80c (Cdna microarray) Comparison of proteome at 70 and 80c (Protein expression

profiling) Enzyme activity tests for s. solfataricus (Assay types) Intracellular metabolomics of s. solfataricus at 70 and 80c

(Metabolomics)

Page 17: SysMO-DB and ISA Katy Wolstencroft, University of Manchester, UK

ISA in SysMO

Assays linked to data files Data files linked together Assays and data files linked to protocols and

SOPs

ISA data is available to all in consortium Data files and SOPs may be shared or kept

private

Page 18: SysMO-DB and ISA Katy Wolstencroft, University of Manchester, UK

Advantages

A common structure across consortium Can be bundled together with data files to

produce a common export format Allows automated submission to public omics stores

ArrayExpress, Pride etc

Requires SysMO consortium members to only record metadata once

Page 19: SysMO-DB and ISA Katy Wolstencroft, University of Manchester, UK

Experimental Data Metadata

People

ProjectsAssay

Study

Experimental conditionsFactors studied

Models

SOPs

Homogenised terminology and values in the datasets themselves

Workflows

Based on ISA-TAB

Investigation

SEEK + JERM

Page 20: SysMO-DB and ISA Katy Wolstencroft, University of Manchester, UK

Acknowledgements SysMO-DB Team SysMO-PALS

myGrid, EML and JWS Online teams OMII-UK, Uni Southampton EMBL-EBI, MCISB