18
The Royal Society London, May 19-21st, Mouse models for human disease Phenotype database interoperability and integration Damian Smedley, EBI

Phenotype database interoperability and integration

  • Upload
    velma

  • View
    29

  • Download
    0

Embed Size (px)

DESCRIPTION

Phenotype database interoperability and integration. Damian Smedley, EBI. Why do we need data integration and interoperability?. Centralised vs distributed solutions. Distributed solution . Centralised warehouse v2 . Centralised warehouse v1 . Strains. portal. Genomics. MGI. JaxMice. - PowerPoint PPT Presentation

Citation preview

Page 1: Phenotype database interoperability and  integration

The Royal Society London, May 19-21st, 2010Mouse models for human disease

Phenotype database interoperability and integration

Damian Smedley, EBI

Page 2: Phenotype database interoperability and  integration

The Royal Society London, May 19-21st, 2010Mouse models for human disease

Why do we need data integration and interoperability?

Page 3: Phenotype database interoperability and  integration

The Royal Society London, May 19-21st, 2010Mouse models for human disease

Centralised vs distributed solutions

Genomics

MGI

Ensembl

IKMC projects

KOMP EUCOMM NorCOMM Eurexpress/GXD etc

JaxMice

Phenotype/Expression

Strains

IMSR EMMA

EurophenomeTIGM

portal

Centralised warehouse v1

Centraldatabase

Centralised warehouse v2 Distributed solution

nightly data syncsweb services

Page 4: Phenotype database interoperability and  integration

The Royal Society London, May 19-21st, 2010Mouse models for human disease

Centralised solutions

Advantages– Better query performance for large datasets– Easier to analyse raw data in one location

Disadvantages– Regular data deposition is non-trivial– Designing a single schema to store different types

of data is not simple.– Persuading people to “give up” their

data/databases/websites– Will still need to make interoperable with other data

sources

Page 5: Phenotype database interoperability and  integration

The Royal Society London, May 19-21st, 2010Mouse models for human disease

Distributed solutions

Advantages– Domain expertise at production site exploited– Different types of data easily integrated as long as they share

something in common such as a gene identifier– No need for nightly data flow to keep data up to date– No need for redundant data in each database– Easier to persuade people to collaborate in a distributed scenario

Disadvantages– Technical knowledge required to deploy the web services– Potential query performance problems for large datasets (may need

to provide summary level data)– Potential problems performing analysis over all datasets– Problems with services going down

Page 6: Phenotype database interoperability and  integration

The Royal Society London, May 19-21st, 2010Mouse models for human disease

1000 Genomes - centralisation

Page 7: Phenotype database interoperability and  integration

The Royal Society London, May 19-21st, 2010Mouse models for human disease

International Cancer Genome Consortium

CanadaPancreas

AustraliaPancreas

ChinaStomach

JapanLiver (virus related)

FranceLiver (alcohol-related)

Breast (HER2+ve)UK

Breast (several subtypes)

SpainCLL

IndiaOral Cavity

Page 8: Phenotype database interoperability and  integration

The Royal Society London, May 19-21st, 2010Mouse models for human disease

ICGC - distributed

Page 9: Phenotype database interoperability and  integration

The Royal Society London, May 19-21st, 2010Mouse models for human disease

Joint Ensembl and EurExpress query

Page 10: Phenotype database interoperability and  integration

The Royal Society London, May 19-21st, 2010Mouse models for human disease

IKMC portal: knockoutmouse.org

GXD

EurexpressNorCOMM

EUCOMM

KOMP

TIGM

EMMAKOMP rep

CMMRIMSR

EnsemblCREATE

Europhenome

Page 11: Phenotype database interoperability and  integration

The Royal Society London, May 19-21st, 2010Mouse models for human disease

IKMC interoperability strategy

IKMC

Sanger, UK

ES cells + lines

EMMA (UK), KOMP (USA), CMMR (Canada)

Harwell, UK

Phenotype(EuroPhenome etc)

JAX, USA

MGI

Edinburgh, UK

EURExpress

Sanger, UK

Ensembl

JAX, USA

GXDCREATE

EBI, UK

BioMart query interface(s)

MGI ID

MGI ID

MGI IDMGI ID

MGI ID

MGI ID

MGI ID

Page 12: Phenotype database interoperability and  integration

The Royal Society London, May 19-21st, 2010Mouse models for human disease

www.knockoutmouse.org/martsearch

Page 13: Phenotype database interoperability and  integration

The Royal Society London, May 19-21st, 2010Mouse models for human disease

Europhenome: raw and summary data

Page 14: Phenotype database interoperability and  integration

The Royal Society London, May 19-21st, 2010Mouse models for human disease

Possible strategy for phenotype data

BioMart query interface(s)

IKMC

Sanger, UK

ES cells + lines

EMMA (UK), KOMP (USA), CMMR (Canada)

MGI ID

JAX, USA

MGI

Edinburgh, UK

EURExpress

Sanger, UK

Ensembl

MGI IDMGI ID

MGI ID

MGI ID

JAX, USA

GXD

MGI ID

CREATE

EBI, UK

Centraldatabase

High thoughput phenotyping centres

Presentation of raw results

Analysis to assign phenotypes to genes

MGI ID

High throughput phenotyping

Page 15: Phenotype database interoperability and  integration

The Royal Society London, May 19-21st, 2010Mouse models for human disease

Linking from IKMC portal

Phenotyping

Phenotype searches

Page 16: Phenotype database interoperability and  integration

The Royal Society London, May 19-21st, 2010Mouse models for human disease

Linking from IKMC portal

Page 17: Phenotype database interoperability and  integration

The Royal Society London, May 19-21st, 2010Mouse models for human disease

Mouse models for human disease

Page 18: Phenotype database interoperability and  integration

The Royal Society London, May 19-21st, 2010Mouse models for human disease

Acknowledgements

The whole CASIMIR consortium and in particular:• Paul Schofield, Michael Gruenberger, Chao-Kung Chen, George Gkoutos, Ann-Marie Mallon, John Hancock: MouseFinder tool. • MartSearch: Vivek Iyer, Darren Oakley, Bill Skarnes • BioMart: Arek Kaspryzk, Syed Haider, Edoardo Marcora