55
/ 1 groningen bioinformatics center Morris Swertz Morris Swertz ([email protected] ) Braunschweig CASIMIR meeeting July 2, 2008 dbgg – database for genetical genomics update

dbgg – database for genetical genomics update

  • Upload
    arva

  • View
    40

  • Download
    0

Embed Size (px)

DESCRIPTION

dbgg – database for genetical genomics update. Morris Swertz ( [email protected] ) Braunschweig CASIMIR meeeting July 2, 2008. Objective. Share genotype/phenotype data and tools:. 10. 10.000. Main work flow Data dependency Biomaterial/result Lab/analysis process Scale of information - PowerPoint PPT Presentation

Citation preview

Page 1: dbgg – database for genetical genomics update

/ 1groningen bioinformatics center

Morris Swertz

Morris Swertz ([email protected]) Braunschweig CASIMIR meeeting

July 2, 2008

dbgg – database for genetical genomicsupdate

Page 2: dbgg – database for genetical genomics update

/ 2groningen bioinformatics center

Morris Swertz

Objective› Share genotype/phenotype data and

tools:

Page 3: dbgg – database for genetical genomics update

/ 3groningen bioinformatics center

Morris Swertz

inbreed

100

10.000

1,000,000

100,000

10,000

10

10,000,00

QTL profiles

network

correlate

genomestrains

individuals

markers

expressions preprocess

probesmicroarrays

100

hybridize

100,000

genotype genotypes

norm exprs.

10.000

map

Main work flow

Data dependency

Biomaterial/result

Lab/analysis process

Scale of information

Associated data files

process

material

10,000

Complicated experiments

Page 4: dbgg – database for genetical genomics update

/ 4groningen bioinformatics center

Morris Swertz

inbreed

100

10.000

1,000,000

100,000

10,000

10

10,000,00

QTL profiles

network

correlate

genomestrains

individuals

markers

expressions preprocess

probesmicroarrays

100

hybridize

100,000

genotype genotypes

norm exprs.

10.000

map

Collaborator 1

Collaborator 2

Collaborator 3

Barriers to sharing data

Incomplete data!

Incompatible data!

Page 5: dbgg – database for genetical genomics update

/ 5groningen bioinformatics center

Morris Swertz

inbreed

100

10.000

1,000,000

100,000

10,000

10

10,000,00

QTL profiles

network

correlate

genomestrains

individuals

markers

expressions preprocess

probesmicroarrays

100

hybridize

100,000

genotype genotypes

norm exprs.

10.000

map

inbreed

100

10.000

1,000,000

100,000

10,000

10

10,000,00

QTL profiles

network

correlate

genomestrains

individuals

markers

expressions preprocess

probesmicroarrays

100

hybridize

100,000

genotype genotypes

norm exprs.

10.000

map

inbreed

100

10.000

1,000,000

100,000

10,000

10

10,000,00

QTL profiles

network

correlate

genomestrains

individuals

markers

expressions preprocess

probesmicroarrays

100

hybridize

100,000

genotype genotypes

norm exprs.

10.000

map

Investigation 1

Investigation 2

Investigation 3

Barriers to sharing data

Incomplete and/or

incompatible data!

Page 6: dbgg – database for genetical genomics update

/ 6groningen bioinformatics center

Morris Swertz

inbreed

100

10.000

1,000,000

100,000

10,000

10

10,000,00

QTL profiles

network

correlate

genomestrains

individuals

markers

expressions preprocess

probesmicroarrays

100

hybridize

100,000

genotype genotypes

norm exprs.

10.000

map

inbreed

100

10.000

1,000,000

100,000

10,000

10

10,000,00

QTL profiles

network

correlate

genomestrains

individuals

markers

expressions preprocess

probesmicroarrays

100

hybridize

100,000

genotype genotypes

norm exprs.

10.000

map

inbreed

100

10.000

1,000,000

100,000

10,000

10

10,000,00

QTL profiles

network

correlate

genomestrains

individuals

markers

expressions preprocess

probesmicroarrays

100

hybridize

100,000

genotype genotypes

norm exprs.

10.000

map

Barriers to sharing software tools

Page 7: dbgg – database for genetical genomics update

/ 7groningen bioinformatics center

Morris Swertz

inbreed

100

10.000

1,000,000

100,000

10,000

10

10,000,00

QTL profiles

network

correlate

genomestrains

individuals

markers

expressions preprocess

probesmicroarrays

100

hybridize

100,000

genotype genotypes

norm exprs.

10.000

map

inbreed

100

10.000

1,000,000

100,000

10,000

10

10,000,00

QTL profiles

network

correlate

genomestrains

individuals

markers

expressions preprocess

probesmicroarrays

100

hybridize

100,000

genotype genotypes

norm exprs.

10.000

map

inbreed

100

10.000

1,000,000

100,000

10,000

10

10,000,00

QTL profiles

network

correlate

genomestrains

individuals

markers

expressions preprocess

probesmicroarrays

100

hybridize

100,000

genotype genotypes

norm exprs.

10.000

map

Barriers to sharing software tools

Page 8: dbgg – database for genetical genomics update

/ 8groningen bioinformatics center

Morris Swertz

10,000

QTL profiles

10,000

QTL profiles

10,000

QTL profiles

Barriers to sharing software tools

Hard to find and reuse tools

Page 9: dbgg – database for genetical genomics update

/ 9groningen bioinformatics center

Morris Swertz

inbreed

100

10.000

1,000,000

100,000

10,000

10

10,000,00

QTL profiles

network

correlate

genomestrains

individuals

markers

expressions preprocess

probesmicroarrays

100

hybridize

100,000

genotype genotypes

norm exprs.

10.000

map

Use a standard tool?

Page 10: dbgg – database for genetical genomics update

/ 10groningen bioinformatics center

Morris Swertz

Main work flow

Data dependency

Biomaterial/result

Lab/analysis process

Scale of information

Associated data files

process

material

10,000

arab 220903

100 200 300 400 500 600 700 800 900 1000m/z0

100

%

Koornneef0007 526 (11.117) AM (Top,4, Ar,10000.0,556.28,0.70,LS 10); Sm (Mn, 2x1.00); Sb (1,40.00 )1.40e3171.1702

1396

649.3804551

526.3066248172.1795

162

650.3882224

809.4496;80

inbreed

100

100.000

10,000,000

1000

10,000

10

1000

genotypeindividuals

mass peaks

genotypes QTL profiles

strains

network

SNP arrays

correlate

LC/MS

genome

map

preprocess aligned peaks

More biotechnologies, more protocols

Yes, if it could be easily adapted!(and they can’t)

Page 11: dbgg – database for genetical genomics update

/ 11groningen bioinformatics center

Morris Swertz

Objectives› Share genotype/phenotype data and tools:

1. Interoperable software• Simple flat file exchange format• Database server• R/web-service interfaces• A procedure to extend the software

2. Build on extensible data model• Data• Annotations• Investigations• Integration references

› Next steps

Page 12: dbgg – database for genetical genomics update

/ 12groningen bioinformatics center

Morris Swertz

The software› Share genotype/phenotype data and tools:

1. Interoperable software• Simple flat file exchange format• Database server• R/web-service interfaces• A procedure to extend the software

2. Build on extensible data model• Data• Annotations• Investigations• Integration references

› Next steps

Page 13: dbgg – database for genetical genomics update

/ 13groningen bioinformatics center

Morris Swertz

Software: flat file exchange format› Raw and processed data in matrix form

BxD1 BxD2 BxD3 BxD4 BxD5 BxD6 BxD71415670_at 0,293493 0,687197 0,137687 0,5992 0,691055 0,644053 0,9387541415671_at 0,124305 0,261548 0,771756 0,022287 0,374063 0,711998 0,5262771415672_at 0,592037 0,334535 0,173969 0,516279 0,21625 0,970534 0,1927341415673_at 0,555223 0,992222 0,17998 0,79899 0,505028 0,776323 0,7361551415674_a_at 0,585366 0,61328 0,448061 0,977578 0,746478 0,937131 0,7829041415675_at 0,938431 0,272201 0,477756 0,374765 0,840321 0,187776 0,540691415676_a_at 0,700227 0,971044 0,486389 0,236767 0,717116 0,714643 0,4434471415677_at 0,716683 0,380579 0,592676 0,224927 0,304563 0,285177 0,6874261415678_at 0,086303 0,069413 0,601634 0,289336 0,197956 0,820493 0,0721611415679_at 0,669657 0,578992 0,373976 0,581597 0,561598 0,051069 0,0701441415680_at 0,277747 0,716174 0,73642 0,428784 0,614857 0,763586 0,7042631415681_at 0,208313 0,279458 0,063052 0,077388 0,577486 0,087832 0,0638261415682_at 0,94562 0,077064 0,735568 0,081915 0,109705 0,278815 0,350941415683_at 0,308529 0,008908 0,793956 0,304491 0,613119 0,055048 0,698222E.g. microarray data.

Rows = individuals, cols = affy probes.

Page 14: dbgg – database for genetical genomics update

/ 14groningen bioinformatics center

Morris Swertz

Software: flat file exchange format› Annotation info in tabular formname gene chr bplocal bpglobal1415670_at Copg 6 87875328 9245716701415671_at Atp6v0d1 8 108413750 12349086131415672_at Golga7 8 24706942 11512018051415673_at Psph 5 130080298 8201576541415674_a_at Trappc4 9 44155401 12989088981415675_at Dpm2 2 32395013 2274438781415676_a_at Psmb5 14 53568499 19128940201415677_at Dhrs1 14 54693657 19140191781415678_at Ppm1a 12 73712802 17014658381415679_at Psenen 7 30270655 10171686951415680_at Anapc1 2 128304204 323353069

E.g. probe annotation data.Rows = probes

cols = attributes of each probe.

Page 15: dbgg – database for genetical genomics update

/ 15groningen bioinformatics center

Morris Swertz

Software: exchange an experiment Described on

http://gbic.biol.rug.nl/dbgg

dbGG database

dbGG Importtool

dbGG Exporttool

annotations

Raw and processed data

Page 16: dbgg – database for genetical genomics update

/ 16groningen bioinformatics center

Morris Swertz

Software: web user interface

Software

http://gbicserver1.biol.rug.nl:8080/dbgg/molgenis.do

Page 17: dbgg – database for genetical genomics update

/ 17groningen bioinformatics center

Morris Swertz

Software: interface to Rsource(“http://localhost:8080/molgenis4gg/R”)

#download data use.experiment(name=“metanetwork”) #set default traits <- get.metabolitedata(name=“mytraits”) genotypes <- get.markerdata(name=“mygenotypes")

#calculate mQTLslibrary(“MetaNetwork”) qtls <- qtlMapTwoPart(genotypes=genotypes,

traits=traits, spike=4)

#upload results for others to useadd.mqtldata(qtls, name=“myqtls”)

RIL1 RIL3 RIL4 … LCavg.1537 NA 942 2402 … LCavg.1594 NA 4 10 … LCvag.1610 NA 55 62 … … … … … …

Input markers,traits, genotypes

AMap QTLs(qtlMapTwoPart)

QTL profiles{qtlProfiles}

QTL threshold{qtlThres}

BSimulation/FDR(qtlThreshold/qtlFDR)

QTL summary{qtlSumm}

Input mass/chargepeaks

CQTL summary(qtlSummary) Significant QTLs

DZero-ordercorrelation

Peak multiplicity{peakMultiplicity}

HPeak multiplicity(findPeakultiplicity)

Correlation matrix{corrZeroOrder}

FPermutation(qtlCorrThreshold)

E2nd-order correlation(qtlCorrSecondOrder )

Correlation matrix{corrSecondOrder}

Correlation threshold{corrThres}

GCreate network(createCytoFiles)

Network files[network.sif,

network.eda]

inspectMetaNetwork protocol:Fu, Swertz, Keurentjes, Jansen, Nature Protocols, 2007.

Page 18: dbgg – database for genetical genomics update

/ 18groningen bioinformatics center

Morris Swertz

Software: interface to Taverna

add dbGG interface

Page 19: dbgg – database for genetical genomics update

/ 19groningen bioinformatics center

Morris Swertz

Software: interface to TavernaUse data in

dbGG

Page 20: dbgg – database for genetical genomics update

/ 20groningen bioinformatics center

Morris Swertz

This enables automatic processing(see also CASIMIR use ‘case 1’)

Smedley, Swertz, Wolstencroft et al, Submitted.

dbGG

Page 21: dbgg – database for genetical genomics update

/ 21groningen bioinformatics center

Morris Swertz

Use BioMART and MOLGENIS to access data and Taverna to automate the workflows

Smedley, Swertz, Wolstencroft et al, Submitted.

ws

ws

ws

SNPsStrain SNP Alleles

Pathways ws

Gene symbols

YourdbGG

Page 22: dbgg – database for genetical genomics update

/ 22groningen bioinformatics center

Morris Swertz

Reusable assets and generator/interpreter

Little language<!-- entity organization --><entity name="Experiment" label="Experiment"> <field name="ExperimentID" key="1“ readonly="true" label="ExperimentID(autonum)"/> <field name="Medium" type="xref" xref_field="Medium.name"/> /> <field name="Protocol" label="Experiment Protocol"/> <field name="Temperature" type="int"

Domain specific language<!-- entity organization --><entity name="Experiment" label="Experiment"> <field name="ExperimentID" key="1“ readonly="true" label="ExperimentID(autonum)"/> <field name="Medium" type="xref" xref_field="Medium.name"/> /> <field name="Protocol" label="Experiment Protocol"/> <field name="Temperature" type="int" +

Software: extension procedure(using MOLGENIS)

dbGG v1: for microarrays

dbGG v2: for mass

spectrometry

Page 23: dbgg – database for genetical genomics update

/ 23groningen bioinformatics center

Morris Swertz

Software: extension procedure

<entity name="Metabolite" extends="Trait"> <field name="Formula" nillable="true" description="The chemical formula of a metabolite." /> <field name="Mass" type="decimal" nillable="true" description="The mass of this metabolite" /> <field name="Structure" type="text" nillable="true" description="The chemical structure of this metabolite." /> </entity>

Page 24: dbgg – database for genetical genomics update

/ 24groningen bioinformatics center

Morris Swertz

Website: demos and downloads

http://gbic.biol.rug.nl/dbgg

Page 25: dbgg – database for genetical genomics update

/ 25groningen bioinformatics center

Morris Swertz

Outline› To share genotype/phenotype data and

tools:1. Interoperable software

• Flat file exchange format• Database server• R/web-service interfaces• A procedure to extend the software

2. Build on extensible data model• Data• Annotations• Investigations• Integration references

› Next steps

Page 26: dbgg – database for genetical genomics update

/ 26groningen bioinformatics center

Morris Swertz

BxD1 BxD2 BxD3 BxD4 BxD5 BxD6 BxD7rs13475697 1 1 0 1 0 1 0rs13475698 1 0 0 0 0 0 1rs13475699 0 0 0 1 0 1 1rs13475700 1 1 1 1 0 1 0rs13475701 1 0 1 0 0 1 1rs2228909 1 1 0 1 0 0 0rs2228910 0 0 1 1 0 0 0rs3022775 0 0 0 1 1 0 1rs3024102 1 0 1 0 0 0 0rs3024103 1 0 0 1 0 0 0rs3024104 0 1 0 0 0 0 0rs3024105 0 0 1 0 0 0 1rs30462182 1 0 0 0 0 0 0rs30522279 0 1 0 0 1 0 0

Data› Simple and close to current practice:

Genotype data

MARKERS

Subjects: STRAINS

DATA ELEMENTS

Traits:

TRAIT SUBJECT

Page 27: dbgg – database for genetical genomics update

/ 27groningen bioinformatics center

Morris Swertz

Data› Simple and close to current practice:

Genotype dataExpression data

BxD1 BxD2 BxD3 BxD4 BxD5 BxD6 BxD71415670_at 0,293493 0,687197 0,137687 0,5992 0,691055 0,644053 0,9387541415671_at 0,124305 0,261548 0,771756 0,022287 0,374063 0,711998 0,5262771415672_at 0,592037 0,334535 0,173969 0,516279 0,21625 0,970534 0,1927341415673_at 0,555223 0,992222 0,17998 0,79899 0,505028 0,776323 0,7361551415674_a_at 0,585366 0,61328 0,448061 0,977578 0,746478 0,937131 0,7829041415675_at 0,938431 0,272201 0,477756 0,374765 0,840321 0,187776 0,540691415676_a_at 0,700227 0,971044 0,486389 0,236767 0,717116 0,714643 0,4434471415677_at 0,716683 0,380579 0,592676 0,224927 0,304563 0,285177 0,6874261415678_at 0,086303 0,069413 0,601634 0,289336 0,197956 0,820493 0,0721611415679_at 0,669657 0,578992 0,373976 0,581597 0,561598 0,051069 0,0701441415680_at 0,277747 0,716174 0,73642 0,428784 0,614857 0,763586 0,7042631415681_at 0,208313 0,279458 0,063052 0,077388 0,577486 0,087832 0,0638261415682_at 0,94562 0,077064 0,735568 0,081915 0,109705 0,278815 0,350941415683_at 0,308529 0,008908 0,793956 0,304491 0,613119 0,055048 0,698222

PROBES

Subjects: INDIVIDUALS

DATA ELEMENTS

Traits:

TRAIT SUBJECT

Page 28: dbgg – database for genetical genomics update

/ 28groningen bioinformatics center

Morris Swertz

Data› Simple and close to current practice:

Genotype dataExpression data

Classic phenotype dataMetabolite abundance data

Protein abundance dataAnd so on…

TRAIT SUBJECT

Page 29: dbgg – database for genetical genomics update

/ 29groningen bioinformatics center

Morris Swertz

Data with any Dimension Type

TRAIT

SUBJECT

DATA ELEMENT

• Individual,

• Strain,• Sample,• …

• Probe• Marker• Mass

Peak• …

TRAIT SUBJECT

Page 30: dbgg – database for genetical genomics update

/ 30groningen bioinformatics center

Morris Swertz

Data› Simple and close to current practice:

What about QTL data?

rs13475697rs13475698rs13475699rs13475700rs13475701rs2228909 rs22289101415670_at 0,981848 0,293227 0,034092 0,360978 0,298958 0,466545 0,3703691415671_at 0,464346 0,817348 0,990231 0,204923 0,353808 0,668164 0,4493541415672_at 0,243834 0,900083 0,69971 0,217804 0,471408 0,701617 0,0266091415673_at 0,712543 0,001536 0,209082 0,196611 0,191452 0,91619 0,5356591415674_a_at 0,159777 0,101577 0,678902 0,233476 0,251812 0,349968 0,5671711415675_at 0,777691 0,371057 0,670919 0,410665 0,742277 0,142381 0,5409451415676_a_at 0,320175 0,358505 0,207274 0,952688 0,615915 0,07167 0,2258231415677_at 0,840063 0,281845 0,773908 0,396397 0,482995 0,56668 0,199461415678_at 0,880974 0,471662 0,906012 0,711181 0,622078 0,575441 0,8688161415679_at 0,164846 0,957785 0,794479 0,207902 0,091649 0,727786 0,7960581415680_at 0,56679 0,823206 0,321578 0,513087 0,593739 0,272818 0,6208171415681_at 0,215698 0,384919 0,691254 0,550108 0,603988 0,110792 0,3801261415682_at 0,45273 0,36089 0,733234 0,911573 0,549316 0,086473 0,6396251415683_at 0,526019 0,740045 0,955297 0,797566 0,149079 0,370645 0,57789

PROBES

Traits: MARKERS

DATA

Traits:

Page 31: dbgg – database for genetical genomics update

/ 31groningen bioinformatics center

Morris Swertz

rs13475697rs13475698rs13475699rs13475700rs13475701rs2228909 rs22289101415670_at 0,981848 0,293227 0,034092 0,360978 0,298958 0,466545 0,3703691415671_at 0,464346 0,817348 0,990231 0,204923 0,353808 0,668164 0,4493541415672_at 0,243834 0,900083 0,69971 0,217804 0,471408 0,701617 0,0266091415673_at 0,712543 0,001536 0,209082 0,196611 0,191452 0,91619 0,5356591415674_a_at 0,159777 0,101577 0,678902 0,233476 0,251812 0,349968 0,5671711415675_at 0,777691 0,371057 0,670919 0,410665 0,742277 0,142381 0,5409451415676_a_at 0,320175 0,358505 0,207274 0,952688 0,615915 0,07167 0,2258231415677_at 0,840063 0,281845 0,773908 0,396397 0,482995 0,56668 0,199461415678_at 0,880974 0,471662 0,906012 0,711181 0,622078 0,575441 0,8688161415679_at 0,164846 0,957785 0,794479 0,207902 0,091649 0,727786 0,7960581415680_at 0,56679 0,823206 0,321578 0,513087 0,593739 0,272818 0,6208171415681_at 0,215698 0,384919 0,691254 0,550108 0,603988 0,110792 0,3801261415682_at 0,45273 0,36089 0,733234 0,911573 0,549316 0,086473 0,6396251415683_at 0,526019 0,740045 0,955297 0,797566 0,149079 0,370645 0,57789

PROBES

Traits: MARKERS

DATA

Data› Simple and close to current practice:

What about QTL data?Probe association data?

Interaction network data?

TRAIT TRAITSUBJECT SUBJECT

Traits:

Page 32: dbgg – database for genetical genomics update

/ 32groningen bioinformatics center

Morris Swertz

DATA ELEMENT

Data with any Dimension Type› Minimal data model

TRAIT

SUBJECT

DATA ELEMENT columns

rows

dimension ELEMENT

Page 33: dbgg – database for genetical genomics update

/ 33groningen bioinformatics center

Morris Swertz

The data model› To share genotype/phenotype data and

tools:1. Extensible data model

• Data• Annotations• Investigations• Integration references

Page 34: dbgg – database for genetical genomics update

/ 34groningen bioinformatics center

Morris Swertz

Annotations› Simple and close to current practice

Probe annotations

name gene chr bplocal bpglobal1415670_at Copg 6 87875328 9245716701415671_at Atp6v0d1 8 108413750 12349086131415672_at Golga7 8 24706942 11512018051415673_at Psph 5 130080298 8201576541415674_a_at Trappc4 9 44155401 12989088981415675_at Dpm2 2 32395013 2274438781415676_a_at Psmb5 14 53568499 19128940201415677_at Dhrs1 14 54693657 19140191781415678_at Ppm1a 12 73712802 17014658381415679_at Psenen 7 30270655 10171686951415680_at Anapc1 2 128304204 323353069

PROBE IS A VARIANT OF TRAITHAVING:-Name-Gene-Chromosme-Locus

Page 35: dbgg – database for genetical genomics update

/ 35groningen bioinformatics center

Morris Swertz

DATA ELEMENT

Annotation extends Trait or Subject

TRAIT

SUBJECT

column

row

dimension ELEMENT

PROBE-Name-Gene-Chromosme-Locus

MARKER-Name-Allele-Chromosme-Locus

MASSPEAK-Name-MZ-RetentionTime

STRAIN-Name-Type: CSS, RIL..-Parent Strains

INDIVIDUAL-Name-Strain-Mother-Father-Sex

SAMPLE-Name-Individual-Tissue

And so on…

And so on…

Page 36: dbgg – database for genetical genomics update

/ 36groningen bioinformatics center

Morris Swertz

DATA ELEMENTMARKER

STRAIN

Annotation simple in practice

DATA ELEMENT

PROBE

MARKER

Genotype data QTL data

DATA ELEMENT

MARKER

INDIVIDL

Expression data Extensions are automatic “under the hood”PROBE isa TRAIT isa DIMENSION ELEMENT PROBE

TRAIT

dimension ELEMENT

Page 37: dbgg – database for genetical genomics update

/ 37groningen bioinformatics center

Morris Swertz

DATA ELEMENTSBxD1 BxD2 BxD3 BxD4 BxD5 BxD6 BxD7

1415670_at 0,293493 0,687197 0,137687 0,5992 0,691055 0,644053 0,9387541415671_at 0,124305 0,261548 0,771756 0,022287 0,374063 0,711998 0,5262771415672_at 0,592037 0,334535 0,173969 0,516279 0,21625 0,970534 0,1927341415673_at 0,555223 0,992222 0,17998 0,79899 0,505028 0,776323 0,7361551415674_a_at 0,585366 0,61328 0,448061 0,977578 0,746478 0,937131 0,7829041415675_at 0,938431 0,272201 0,477756 0,374765 0,840321 0,187776 0,540691415676_a_at 0,700227 0,971044 0,486389 0,236767 0,717116 0,714643 0,4434471415677_at 0,716683 0,380579 0,592676 0,224927 0,304563 0,285177 0,6874261415678_at 0,086303 0,069413 0,601634 0,289336 0,197956 0,820493 0,0721611415679_at 0,669657 0,578992 0,373976 0,581597 0,561598 0,051069 0,0701441415680_at 0,277747 0,716174 0,73642 0,428784 0,614857 0,763586 0,7042631415681_at 0,208313 0,279458 0,063052 0,077388 0,577486 0,087832 0,0638261415682_at 0,94562 0,077064 0,735568 0,081915 0,109705 0,278815 0,350941415683_at 0,308529 0,008908 0,793956 0,304491 0,613119 0,055048 0,698222

PROBES

Data and annotations

name gene chr bplocal bpglobal1415670_at Copg 6 87875328 9245716701415671_at Atp6v0d1 8 108413750 12349086131415672_at Golga7 8 24706942 11512018051415673_at Psph 5 130080298 8201576541415674_a_at Trappc4 9 44155401 12989088981415675_at Dpm2 2 32395013 2274438781415676_a_at Psmb5 14 53568499 19128940201415677_at Dhrs1 14 54693657 19140191781415678_at Ppm1a 12 73712802 17014658381415679_at Psenen 7 30270655 10171686951415680_at Anapc1 2 128304204 323353069

Page 38: dbgg – database for genetical genomics update

/ 38groningen bioinformatics center

Morris Swertz

The data model› To share genotype/phenotype data and

tools:1. Extensible data model

• Data• Annotations• Investigations• Integration references

Page 39: dbgg – database for genetical genomics update

/ 39groningen bioinformatics center

Morris Swertz

Investigation workflow in the lab

?

?

?DATA ELEMENTMARKER

STRAIN

DATA ELEMENT

PROBE

MARKER

DATA ELEMENT

MARKER

INDIVIDL

DATA DATA

DATA

Genotype data QTL data

Expression data

Page 40: dbgg – database for genetical genomics update

/ 40groningen bioinformatics center

Morris Swertz

DATA

Investigation building on FuGE

DATA

Genotype data QTL dataQTL

MappingAffy

Array

SNPArray DATA

Expression data

FuGE: Jones et al Nature Biotech 25, 1127-1133

MappingProtocol

Illumina

RSoftware

IlluminaProtocol

Affy M430Protocol

BeadStudio

DATA

application

Protocol

Software

Equipment

BioconductorNorm.

Affy M430platform

DATA DATA

DATA

FuGE:

Page 41: dbgg – database for genetical genomics update

/ 41groningen bioinformatics center

Morris Swertz

Summary of data model

DATA ELEMENT

TRAIT

SUBJECT

dimension ELEMENT

DATA PROTOCOLAPPLICTION

PROTOCOL

INVESTIGATION

column

row Equipment

Software

STRAINPROBE INDIVIDLMARKER

Page 42: dbgg – database for genetical genomics update

/ 42groningen bioinformatics center

Morris Swertz

The data model› To share genotype/phenotype data and

tools:1. Extensible data model

• Data• Annotations• Investigations• Integration references

Page 43: dbgg – database for genetical genomics update

/ 43groningen bioinformatics center

Morris Swertz

DATABASE REFERENCE

Id = ENSMU0S98Db=ENSEMBL

References for integration› Ontology references and database references

GENEName = Mip1alpha

GENEName = Mip1a

INVESTIGATION 1

INVESTIGATION 2

DATABASE REFERENCE

Id = ENSMUS098Db=ENSEMBL

DATABASE REFERENCE

Id = ENSMUS98Db=ENSEMBL

DATABASE REFERENCE

Id = 1419561_ATDb=AFFY 430

ONTOLOGYENTRY

Id = 0005615Term = ABC

Ontology=GO

Hyperlink…

FuGE: Jones et al Nature Biotech 25, 1127-1133

ONTOLOGYENTRY

Id = MP:0005385Term = cardiovascular

Ontology=MP

Incompatible naming

CompatibleIdentifiers

Map mouse on human ontologies

Page 44: dbgg – database for genetical genomics update

/ 44groningen bioinformatics center

Morris Swertz

Summary of data model

DATA ELEMENT

TRAIT

SUBJECT

dimension ELEMENT

DATA PROTOCOLAPPLICTION

PROTOCOL

INVESTIGATION

column

row Equipment

Software

DATABASE REFERENCE

ONTOLOGYENTRY

Hyperlink…

STRAINPROBE INDIVIDLMARKER

extensible to more experiments…

Page 45: dbgg – database for genetical genomics update

/ 46groningen bioinformatics center

Morris Swertz

What is on the todo

Page 46: dbgg – database for genetical genomics update

/ 47groningen bioinformatics center

Morris Swertz

Todo› Publication: submitted

› Building a catalog of tools on top of dbGG•Experiments: in Braunschweig and Groningen

• Illumina, Affy, Metabolites•Tool ‘plug-ins’

• QTL graphs, import of annotations etc.

› Exploit interoperability•E.g. integrate mouse & human with ontologies•Load annotations from other dbGG/BioMARTs•Build on and extend R/Taverna interaction

Page 47: dbgg – database for genetical genomics update

/ 48groningen bioinformatics center

Morris Swertz

Summary and questions› Share genotype/phenotype data and tools:

1. Interoperable software• Simple flat file exchange format• Database server• R/web-service interfaces• A procedure to extend the software

2. Build on extensible data model• Data• Annotations• Investigations• Integration references

› Next steps

Page 48: dbgg – database for genetical genomics update

/ 49groningen bioinformatics center

Morris Swertz

Thank [email protected]

Morris A. SwertzBruno M. TessonRichard A. ScheltemaGonzalo VeraRudi AlbertsDamian SmedleyKaty WolstencroftAndrew R. Jones

Klaus SchughartJohn M. HancockHelen E. Parkinson Engbert O. de BrockCarole GoblePaul SchofieldRitsert C. Jansenthe GEN2PHEN consortiumthe CASIMIR consortium

Page 49: dbgg – database for genetical genomics update

/ 50groningen bioinformatics center

Morris Swertz

Appendix:Procedure to (re)generate a MOLGENIS

Page 50: dbgg – database for genetical genomics update

/ 51groningen bioinformatics center

Morris Swertz

MOLGENIS for data

Page 51: dbgg – database for genetical genomics update

/ 52groningen bioinformatics center

Morris Swertz

Describe in little language

Assay

ID : autoidName : varchar

ID : autoidValue : object

DataColumn

1

Assay 1Row 1

ID : autoidName : varchar

Experiment

Experiment 1Experiment1

ID : autoidName : varchar

Trait

ID : autoidName : varchar

Subject

Experiment1

individuals

expressions

probes

Page 52: dbgg – database for genetical genomics update

/ 53groningen bioinformatics center

Morris Swertz

Assay

ID : autoidName : varchar

ID : autoidValue : object

DataColumn

1

Assay 1Row 1

ID : autoidName : varchar

Experiment

Experiment 1Experiment1

ID : autoidName : varchar

Trait

ID : autoidName : varchar

Subject

Experiment1

Describe in little language

Page 53: dbgg – database for genetical genomics update

/ 54groningen bioinformatics center

Morris Swertz

Describe in little languageAssay

ID : autoidName : varchar

ID : autoidValue : object

DataColumn

1

Assay 1Row 1

ID : autoidName : varchar

Experiment

Experiment 1Experiment1

ID : autoidName : varchar

Trait

ID : autoidName : varchar

Subject

Experiment1

Page 54: dbgg – database for genetical genomics update

/ 55groningen bioinformatics center

Morris Swertz

Case GG: Generate and evaluate

http://gbic.biol.rug.nl/supplementary/2007/molgenis_showcase

Page 55: dbgg – database for genetical genomics update

/ 56groningen bioinformatics center

Morris Swertz

Describe in little language

http://gbic.biol.rug.nl/supplementary/2007/molgenis_showcase