13
Towards a federated microarray gene expression repository using MOLGENIS and MAGE-TAB Alexandros Kanterakis, Tomasz Adamusiak, Juha Muilu, Helen Parkinson, Despoina Antonakaki, Morris A. Swertz

Kanterakis bosc2010 molgenis

Embed Size (px)

Citation preview

Page 1: Kanterakis bosc2010 molgenis

Towards a federated microarray gene expression repository using MOLGENIS and MAGE-TAB

Alexandros Kanterakis, Tomasz Adamusiak, Juha Muilu, Helen Parkinson, Despoina Antonakaki, Morris A. Swertz

Page 2: Kanterakis bosc2010 molgenis

About BBMRI-NL

› Biobank research infrastructure› Exploit the wealth of information in

microarray and GWAS› Data currently fragmented between individual

biobanks (>6500) samples

Page 3: Kanterakis bosc2010 molgenis

Objectives (1/2)

› Establish: web-based national repository for microarray gene expression data

› Populate: with well-annotated microarray experiments

› Share: the software as ‘microarray database in-a-box’ such that all BBMRI biobanks can reuse it locally

Extendable

Diverging local needs

RequirementsInterfaces

User Interface

Programmatic Interfaces

Data federation

Analysis Protocols

Page 4: Kanterakis bosc2010 molgenis

Combine gene expression data from multi-platform microarray experiments with GWAS studies in order to create novel eQTL datasets for complex diseases

+

Objectives (2/2)

Page 5: Kanterakis bosc2010 molgenis

MAGE-TAB (1/2)

› MAGE-TAB: simple, human readable, tab-delimited.Comprised by 4 parts:

1. Investigation Description Format (IDF). General information, contact details, bibliographic references,...

2. Array Design Format (ADF). What sequence is located at each position on an array and what the annotation of this sequence is.

3. Raw and processed data files. ASCII or binary files.

2006

Page 6: Kanterakis bosc2010 molgenis

MAGE-TAB (2/2)

4. Sample and Data Relationship Format (SDRF). Relationships between samples, arrays, extracts, hybridizations and other objects used in the investigation.

Page 7: Kanterakis bosc2010 molgenis

MAGE-TAB Object Model

› From MAGE-TAB specifications we created a data model* in XML format..

› .. and parsers for MAGE-TAB files.

http://www.mged.org/mage-tab/MAGE-TABv1.0.pdf

*data model is the set of definitions of classes, elements and properties of the data

http://magetab-om.sourceforge.net/magetab_idf.xml

Page 8: Kanterakis bosc2010 molgenis

Visualization of MAGE-TAB OM

ADF

IDF

SDRF

data

Page 9: Kanterakis bosc2010 molgenis

MOLGENIS MAGE-TAB

› From MAGE-TAB Object Model we created a web environment for managing Microarray Experiments:

850 lines of maintainable code

60K lines of automatic generated code

Page 10: Kanterakis bosc2010 molgenis

MOLGENIS MAGE-TAB

Page 11: Kanterakis bosc2010 molgenis

Testing..

For testing and validation purposes we populated the database with data from ArrayExpress:• 7665 experiments from Gene Expression

Omnibus, curated by ArrayExpress• 3940 non-GEO experiments from

ArrayExpress• 320.000 samples, 550 species, 2.400 human

conditions

Page 12: Kanterakis bosc2010 molgenis

Discussion

Features:› APIs: R, Java› Web services: SOAP, REST› Semantic Interfaces: RDF, SPARQL› MAGE-TAB parsers, validators and

visualizationFuture work:› Populate with local data› Plug-in analysis tools› Data and tool sharing among local installs

› Privacy sensitive biobanking community

Page 13: Kanterakis bosc2010 molgenis

Thank you

› Morris Swertz› Joeri van der Velde› Lude Franke› Danny Arends

Acknowledgements:

Email: [email protected]

Generating a data platform for microarray gene expression experiments using MOLGENIS and MAGE-TAB

E15

MOLGENIS: rapid generation of flexible software platforms for any genotype and phenotype experiment

E19

Posters: