Upload
lykhanh
View
214
Download
0
Embed Size (px)
Citation preview
EBI
Overview for Metadata Capture Symposium
November 10, Utrecht
Morris Swertz and the MOLGENIS team*
BBMRI-NL, EU-GEN2PHEN, EU-CASIMIR, EU-EURATRANS, LifeLines, EU-
SYSGENET, EU-PANACEA, NBIC and other consortia
*Genomics Coordination Center Groningen
MOLGENIS mission
Grow a product family that supports all *omics experiments…
sharing models and software notwithstanding large variation
etc.
etc.
Genomics Coordination Center – UMC/University Groningen
select
165
10.000
1,000,000
1000,000
10,000
165k
10,000,00
eQTLprofiles
network
correlate
genomecohorts
individuals
markers
expressions
preprocess
probesmicroarrays
100
hybridize
100,000
genotype genotypes
norm exprs.
map
Our biologist challenges
Wanted:
infrastructures for
More biologist challenges:
NGS * 750
Dutch Control Cohort * 80K
Local biobanks * 200
select
165 5,000,000
1000,000
100,000
165k
10,000,00
HapMapNL impute
genomepanels
individuals
genotype preprocess
probesmicroarrays
100
hybridize
100,000
SNPs call
phased
QCsequence
Etc.
6
Connect to annotation services
Connect to annotation services
Plugin rich analysis toolsPlugin rich
analysis toolsConnect to
statisticsConnect to
statisticsUML documentation of
your modelUML documentation of
your model
Edit & trace your dataEdit & trace your data
Import/export to ExcelImport/export to Excel
find.investigation()102 downloaded
obs<-find.observedvalue(43,920 downloaded
#some calculationadd.inferredvalue(res)36 added
�
�
�
� � �
Wanted: ‘dynamic’ data and processing infrastructure
MOLGENIS method
Our challenges
biologist
GUI
STORE
bioinformatician
inbreed
100
10.000
1,000,000
100,000
10,000
10
10,000,00
QTL profiles
network
correlate
genomestrains
individuals
markers
expressions preprocess
probesmicroarrays
100
hybridize
100,000
genotype genotypes
norm exprs.
map
Logic
ANALYSE
Exchange servicesANNOTATE
Etc
(bio)informatician
Use
Animal Observatory
NextGenSeq data
Mutation data
Model organisms data
Researcher needs Work very hard
Situation before MOLGENIS
http://www.molgenis.orgSwertz et al (2010) BMC Bioinformatics accepted Swertz & Jansen (2007) Nature Reviews Genetics 8, 235-243Swertz et al (2004) Bioinformatics 20(13), 2075-83
����
����
Using MOLGENIS
NextGenSeq
Mutation database
Model organisms
Model Use generated software
Animal Observatory
GeneratorGenerator
repeat often
http://www.molgenis.orgSwertz & Jansen (2007) Nature Reviews Genetics 8, 235-243Swertz et al (2004) Bioinformatics 20(13), 2075-83
End products on top of MOLGENIS12
Mutations
Biobank
Sequencing
Proteo/Metabolomics
Animal LIMS
GWAS / QTL studies
And more …
MOLGENIS contributors
GCC/MOLGENIS contributors
Morris Swertz
Erik Roos
Joeri van der Velde
Robert Wagner
Joris Lops
Danny Arends
Despoina Antonakaki
Alex Kanterakis
Jessica Lundberg
Andre de Vries
George Byelas
Freerk van Dijk
And many others
External contributors
Tomasz Adamuziak
Juha Muilu
Gudmundur Thorisson
Sirisha Gollapudi
Helen Parkinson
Pedro Lopes
And many others
Emphasis on collaboration
BBMRI-NL biobanking (Hs)
EU-GEN2PHEN consortium (Hs)
EU-PANACEA consortium (Ce)
EU-EURATRANS consortium (Rn)
NL Brassica Nutr. consortium (At)
EU-CASIMIR consortium (Mm)
NBIC/BioAssist consortium (bioinfo)
And others
13
NLNLEBI
MOLGENIS family features
All MOLGENIS systems differ in their model butshare the common toolchain
Data exchange and loading
http://www.xgap.orgSwertz, van der Velde et al (2010) Genome Biology 9;11(3): R27.
Data loading
http://www.xgap.orgSwertz, van der Velde et al (2010) Genome Biology 9;11(3): R27.
User interfacing
› 102 biobank studies
› 2,042 features
› 42,939 individuals
› 287 panels
› 196 protocols
› 140 ontology terms
+170 dbGaP
investigations
17E.g. Pheno-OM
›102 investigations›2,042 features
›42,939 individuals›287 panels
›196 protocols›140 ontology terms
+170 dbGaPinvestigations
Demo: http://wwwdev.ebi.ac.uk/microarray-srv/pheno/Source: https://svn.gene.le.ac.uk/gen2phen/pheno-model
http://www.xgap.orgSwertz, van der Velde et al (2010) Genome Biology 9;11(3): R27.
Data explorationQTL studies
Programmatic interfaces R, REST, SOAP, …
#connect to my XGAP database source("http://aserver/xgap/api/R") #upload my 'metanetwork' investigation add.investigation(name="metanetwork") #use 'metanetwork' investigation use.investigation(name="metanetwork") #upload subjects and traits add.marker(name=rownames(markers), chr =markers$chr, cm =markers$cM) add.metabolite(name=rownames(metabolites)) add.subject(name=colnames(genotypes)) #upload genotype and phenotype data matrices add.datamatrix(geno,
name="geno" rowtype="marker" coltype="subject" valuetype="text")
add.datamatrix(mpheno, name="mexpr" rowtype="metabolite" coltype="subject" valuetype="double")
#connect to XGAP database source("http://aserver/xgap/api/R") #use 'metanetwork' investigation use.investigation(name="metanetwork") #list available data sets find.datasets() #download genotype and phenotype datasets geno <-find.datamatrix(name="geno") mpheno <-find.datamatrix(name="mexpr") markers <-find.markers() #calculate & plot (Fu 2007, Nature Protocols) mqtls <-qtlMapTwoPart(geno,mpheno,spike=4) qtlPlot(markers,mqtls, 4) #upload qtl result matrix add.datamatrix(mqtls,
name="qtlprofiles" rowtype="metabolite" coltype="marker" valuetype="double")
XGAP
genotypes
markers arab 220903
100 200 300 400 500 600 700 800 900 1000m/z0
100
%
Koornneef0007 526 (11.117) AM (Top,4, A r,10000.0,556.28 ,0 .70 ,LS 10); Sm (Mn, 2x1.00); Sb (1,40.00 )1.40e3171.1702
1396
649.3804551
526.3066248172.1795
162
650.3882224
809.4496;80
phenotype
QTLs
subjects markers subjects arab 220903
100 200 300 400 500 600 700 800 900 1000m/z0
100
%
Koornneef0007 526 (11.117) AM (Top,4, A r,10000.0,556.28 ,0 .70 ,LS 10); Sm (Mn, 2x1.00); Sb (1,40.00 )1.40e3171.1702
1396
649.3804551
526.3066248172.1795
162
650.3882224
809.4496;80
phenotype genotypes
Scientist A uploads raw data Scientist B uploads analysis results http://www.xgap.orgSwertz et al (2010) Genome Biology 9;11(3): R27.
Big QTL, GWAS, NGS, Proteomics computing
Data analysis using cloud/cluster
See poster Q01:
User friendly cluster computing for R/QTL analysis on XGAP
Current work
• Merge all data models
• Next session
Current work: more pipelines
• Galaxy tool defs?
• Taverna flows?
Genome of the Netherlandshttp://www.bbmriwiki.nl
24
Panacea
GEN2PHEN
LifeLines
Deformed ears?
HPO:Abnormally shaped ears Auricular malformation
Deformed auricles
MP:Malformed auricles
Malformed ears Malformed external ears
etc
query expansion
Current work: semantic toolsD2RQ, Lucene, OntoCAT, RDF/OWL
Local ontologies
(OLW or OBO)
CWA
BioPortal
OLS
OntoCAT – Ontology common API taskshttp://www.ontocat.org and http://precedings.nature.com/documents/4666
Abnormally shaped ears ☺
MOLGENIS summary
1. Flexible models
• From biobank to local researcher to community
• eXtensible Genotype And Phenotype model
2. Flexible software
•Get much more because of open source sharing
• Agile development: short cycles with ‘client’
3. Enabling modules
• From cluster backends, workflows to rdf/owl
• Spinouts like Ontocat
Web
• MOLGENIS: http://www.molgenis.org
• XGAP: http://www.xgap.org
• OntoCAT: http://www.ontocat.org
• BBMRI-NL wiki: http://www.bbmriwiki.nl
Read
• Swertz et al (2010) BMC Bioinformatics, due December.
• Swertz et al (2010) Genome Biology 9;11(3): R27.
• Smedley et al (2008) Briefings in Bioinformatics 9(6): 532-544
• Swertz & Jansen (2007) Nature Reviews Genetics 8, 235-243
• Swertz et al (2004) Bioinformatics 20(13), 2075-83
Thank you!