Upload
bosc-2010
View
572
Download
0
Tags:
Embed Size (px)
Citation preview
Towards a federated microarray gene expression repository using MOLGENIS and MAGE-TAB
Alexandros Kanterakis, Tomasz Adamusiak, Juha Muilu, Helen Parkinson, Despoina Antonakaki, Morris A. Swertz
About BBMRI-NL
› Biobank research infrastructure› Exploit the wealth of information in
microarray and GWAS› Data currently fragmented between individual
biobanks (>6500) samples
Objectives (1/2)
› Establish: web-based national repository for microarray gene expression data
› Populate: with well-annotated microarray experiments
› Share: the software as ‘microarray database in-a-box’ such that all BBMRI biobanks can reuse it locally
Extendable
Diverging local needs
RequirementsInterfaces
User Interface
Programmatic Interfaces
Data federation
Analysis Protocols
Combine gene expression data from multi-platform microarray experiments with GWAS studies in order to create novel eQTL datasets for complex diseases
+
Objectives (2/2)
MAGE-TAB (1/2)
› MAGE-TAB: simple, human readable, tab-delimited.Comprised by 4 parts:
1. Investigation Description Format (IDF). General information, contact details, bibliographic references,...
2. Array Design Format (ADF). What sequence is located at each position on an array and what the annotation of this sequence is.
3. Raw and processed data files. ASCII or binary files.
2006
MAGE-TAB (2/2)
4. Sample and Data Relationship Format (SDRF). Relationships between samples, arrays, extracts, hybridizations and other objects used in the investigation.
MAGE-TAB Object Model
› From MAGE-TAB specifications we created a data model* in XML format..
› .. and parsers for MAGE-TAB files.
http://www.mged.org/mage-tab/MAGE-TABv1.0.pdf
*data model is the set of definitions of classes, elements and properties of the data
http://magetab-om.sourceforge.net/magetab_idf.xml
Visualization of MAGE-TAB OM
ADF
IDF
SDRF
data
MOLGENIS MAGE-TAB
› From MAGE-TAB Object Model we created a web environment for managing Microarray Experiments:
850 lines of maintainable code
60K lines of automatic generated code
MOLGENIS MAGE-TAB
Testing..
For testing and validation purposes we populated the database with data from ArrayExpress:• 7665 experiments from Gene Expression
Omnibus, curated by ArrayExpress• 3940 non-GEO experiments from
ArrayExpress• 320.000 samples, 550 species, 2.400 human
conditions
Discussion
Features:› APIs: R, Java› Web services: SOAP, REST› Semantic Interfaces: RDF, SPARQL› MAGE-TAB parsers, validators and
visualizationFuture work:› Populate with local data› Plug-in analysis tools› Data and tool sharing among local installs
› Privacy sensitive biobanking community
Thank you
› Morris Swertz› Joeri van der Velde› Lude Franke› Danny Arends
Acknowledgements:
Email: [email protected]
Generating a data platform for microarray gene expression experiments using MOLGENIS and MAGE-TAB
E15
MOLGENIS: rapid generation of flexible software platforms for any genotype and phenotype experiment
E19
Posters: