24
Phenotype-based Matching Using PhenoDB Terms in BHCMG PhenoDB to Maximize Whole Exome/Genome Data Interpretation Nara Sobreira, MD, PhD Johns Hopkins University McKusick-Nathans Institute of Genetic Medicine

Phenotype-based Matching Using PhenoDB Terms in BHCMG PhenoDB to Maximize Whole Exome/Genome Sequencing Data Interpretation - Nara Sobreira

Embed Size (px)

Citation preview

Page 1: Phenotype-based Matching Using PhenoDB Terms in BHCMG PhenoDB to Maximize Whole Exome/Genome Sequencing Data Interpretation - Nara Sobreira

Phenotype-based Matching Using PhenoDB Terms in BHCMG PhenoDB to Maximize Whole Exome/Genome Data Interpretation

Nara Sobreira, MD, PhD Johns Hopkins University

McKusick-Nathans Institute of Genetic Medicine

Page 2: Phenotype-based Matching Using PhenoDB Terms in BHCMG PhenoDB to Maximize Whole Exome/Genome Sequencing Data Interpretation - Nara Sobreira

http://genematcher.org

Page 3: Phenotype-based Matching Using PhenoDB Terms in BHCMG PhenoDB to Maximize Whole Exome/Genome Sequencing Data Interpretation - Nara Sobreira

GeneMatcher overview

Intended to find other patients/animal models for a novel candidate disease gene

Only deidentified data and genes, so no IRB required Automated matching Submitters choose to follow up at their discretion Now also matching on phenotypic features (since

October 1st 2105)

Page 4: Phenotype-based Matching Using PhenoDB Terms in BHCMG PhenoDB to Maximize Whole Exome/Genome Sequencing Data Interpretation - Nara Sobreira

GeneMatcher Matching options

Page 5: Phenotype-based Matching Using PhenoDB Terms in BHCMG PhenoDB to Maximize Whole Exome/Genome Sequencing Data Interpretation - Nara Sobreira

As of May 1st 2016: 4,459 genes 1,675 submitters 55 countries 5,267 matches

on 1,216 genes

Growth in number of genes and matches in GeneMatcher

Dec.

1st,

2013

Jan.

1st,

2014

Feb.

1st,

2014

March 1st

, 2014

April

1st,

2014

May

1st,

2014

June

1st,

2014

July

1st,

2014

Aug.

1st,

2014

Sept. 1st

, 2014

Oct.

1st,

2014

Nov.

1st,

2014

Dec.

1st,

2014

Jan.

1st,

2015

Feb.

1st,

2015

March 1st

, 2015

April

1st,

2015

May

1st,

2015

June

1st,

2015

July

1st,

2015

Aug.

1st,

2015

Sept. 1st

, 2015

Oct.

1st,

2015

Nov.

1st,

2015

Dec.

1st,

2015

Jan.

1st,

2016

Feb.

1st,

2016

March 1st

, 2016

April

1st,

2016

May

1st,

2016

0

1500

3000

4500

6000

Gene Count Match Count

Page 6: Phenotype-based Matching Using PhenoDB Terms in BHCMG PhenoDB to Maximize Whole Exome/Genome Sequencing Data Interpretation - Nara Sobreira
Page 7: Phenotype-based Matching Using PhenoDB Terms in BHCMG PhenoDB to Maximize Whole Exome/Genome Sequencing Data Interpretation - Nara Sobreira

Matchmaker Exchange Matching options

As of May 1st 2016: 100 matches with

PhenomeCentral 87 matches with

DECIPHER

Page 8: Phenotype-based Matching Using PhenoDB Terms in BHCMG PhenoDB to Maximize Whole Exome/Genome Sequencing Data Interpretation - Nara Sobreira

Hum Mut 34:561, 2013

Hum Mut 36:425, 2015

http://phenodbresearch.net OR http://phenodb.org

Page 9: Phenotype-based Matching Using PhenoDB Terms in BHCMG PhenoDB to Maximize Whole Exome/Genome Sequencing Data Interpretation - Nara Sobreira

BHCMG PhenoDB numbers Holds data on 4,426 submissions

Including 53 cohorts ranging from 5-295

More than 6,225 samples have been sequenced by BHCMG Holds phenotype data from more than 10,284 individuals BHCMG has identified more than 222 novel genes More than 231 known genes and 136 phenotypic expansion

Page 10: Phenotype-based Matching Using PhenoDB Terms in BHCMG PhenoDB to Maximize Whole Exome/Genome Sequencing Data Interpretation - Nara Sobreira

From Robinson PN, Kohler S, Bauer S, Seelow D, Horn D, Mundlos S. The Human Phenotype Ontology: a tool for annotating and analyzing human hereditary disease. Am J Hum Genet. Nov 2008;83(5):610-615.

Page 11: Phenotype-based Matching Using PhenoDB Terms in BHCMG PhenoDB to Maximize Whole Exome/Genome Sequencing Data Interpretation - Nara Sobreira

From Robinson PN, Kohler S, Bauer S, Seelow D, Horn D, Mundlos S. The Human Phenotype Ontology: a tool for annotating and analyzing human hereditary disease. Am J Hum Genet. Nov 2008;83(5):610-615.

Page 12: Phenotype-based Matching Using PhenoDB Terms in BHCMG PhenoDB to Maximize Whole Exome/Genome Sequencing Data Interpretation - Nara Sobreira

Phenotype Matching Algorithms- General Approach

Page 13: Phenotype-based Matching Using PhenoDB Terms in BHCMG PhenoDB to Maximize Whole Exome/Genome Sequencing Data Interpretation - Nara Sobreira

Testing set - 44 published cases with known Mendelian phenotypes and detailed phenotypic descriptions

Question: Can the algorithms match query cases of a known syndrome to other cases with same diagnosis in the testing set?

Algorithm Validation

Page 14: Phenotype-based Matching Using PhenoDB Terms in BHCMG PhenoDB to Maximize Whole Exome/Genome Sequencing Data Interpretation - Nara Sobreira

Defined test set Picked phenotype to be tested, remove all cases of this

phenotype from the testing set Picked a case with the testing phenotype as a query case and a

case to be put back into testing set Applied the matching algorithm Is testing case in top 1 or top 5 most similar cases? Repeat x 1000

Pairs-Based Testing Approach

Page 15: Phenotype-based Matching Using PhenoDB Terms in BHCMG PhenoDB to Maximize Whole Exome/Genome Sequencing Data Interpretation - Nara Sobreira

Percent of Cases For Which the Best Phenotypic Match From the Database Has the Same Syndrome

SimUI Jaccard Distance WangResnick-PhenoDB

Resnick-OMIM

SimGIC-PhenoDB

SimGIC-

OMIM PhenoDigm

Congenital Disorder of Deglysolyation

1 1 0.87 1 1 1 1 1 1

Floating-Harbor Syndrome

1 1 1 1 1 1 1 1 1

Poretti-Boltshauser Syndrome

1 1 1 1 1 1 1 1 1

Cerebrocosto-mandibular Syndrome

0.98 0.63 0.57 0.53 0.25 0.25 0.86 0.84 0.46

Page 16: Phenotype-based Matching Using PhenoDB Terms in BHCMG PhenoDB to Maximize Whole Exome/Genome Sequencing Data Interpretation - Nara Sobreira

BHCMG PhenoDB database use Buske et al. Hum Mutat, 2015 Oct. Removed all cases with fewer than 5 phenotypic features Removed all phenotypes for which only one case was present in

database N=1,152 cases across 32 phenotypes Ran “Top 1” and “Top 5” Pairs-Based Test

Page 17: Phenotype-based Matching Using PhenoDB Terms in BHCMG PhenoDB to Maximize Whole Exome/Genome Sequencing Data Interpretation - Nara Sobreira

Fraction of Cases for Which the Matching Case is in Top 5 Most Similar Cases

Page 18: Phenotype-based Matching Using PhenoDB Terms in BHCMG PhenoDB to Maximize Whole Exome/Genome Sequencing Data Interpretation - Nara Sobreira

“Real-World” Algorithm Testing n=4,114 Wide range of depth phenotypic annotation depth Many cases without assigned OMIM syndromes ID

BHCMG PhenoDB database use

Page 19: Phenotype-based Matching Using PhenoDB Terms in BHCMG PhenoDB to Maximize Whole Exome/Genome Sequencing Data Interpretation - Nara Sobreira

How Well Does a Randomly Selected Query Case Match to Other Cases of Same Clinical Syndrome?

Top 5 Top 25 Top 1st %ile Top 5th %ile

Gomez-Lopez-Hernandez Syndrome (N=6) 2/5 2/5 2/5 4/5Hemifacial Microsomia (N=13) 1/12 2/12 2/12 8/12Lateral Meningocele Syndrome (N=6) 0/5 0/5 0/5 0/5

Page 20: Phenotype-based Matching Using PhenoDB Terms in BHCMG PhenoDB to Maximize Whole Exome/Genome Sequencing Data Interpretation - Nara Sobreira

What Factors Impact Successful Phenotypic Matching?

Phenotypic Features per Case Top 5 Top 25

Top 1st %ile

Top 5th %ile

Gomez-Lopez-Hernandez Syndrome (N=6) 7 2/5 2/5 2/5 4/5Hemifacial Microsomia (N=13) 8 1/12 2/12 2/12 8/12Lateral Meningocele Syndrome (N=6) 1 0/5 0/5 0/5 0/5

Page 21: Phenotype-based Matching Using PhenoDB Terms in BHCMG PhenoDB to Maximize Whole Exome/Genome Sequencing Data Interpretation - Nara Sobreira

As a user of a phenotype matching algorithm, how far “down the list” would you need to go to find relevant matches?

Removed cases with fewer than 5 features

Threshold Testing

Page 22: Phenotype-based Matching Using PhenoDB Terms in BHCMG PhenoDB to Maximize Whole Exome/Genome Sequencing Data Interpretation - Nara Sobreira

Threshold Testing

Page 23: Phenotype-based Matching Using PhenoDB Terms in BHCMG PhenoDB to Maximize Whole Exome/Genome Sequencing Data Interpretation - Nara Sobreira

Algorithms perform best for patients/syndromes with rare and highly specific phenotypic annotations

Depth of phenotypic annotation is key Inherent limitations to reducing a patient with a Mendelian

disorder to a list of phenotypic terms Phenotypic matching in combination with genomic data (e.g. a

VCF file) may offer opportunities for gene discovery

Preliminary Conclusions and Next Steps

Page 24: Phenotype-based Matching Using PhenoDB Terms in BHCMG PhenoDB to Maximize Whole Exome/Genome Sequencing Data Interpretation - Nara Sobreira

Thanks for your attention!Acknowledgements

Joel Krier and François Schiettecatte for the phenotype-matching

project

Ada Hamosh, François Schiettecatte, Corinne Boehm, Julie

Hoover-Fong, Reid Sutton, Jim Lupski, David Valle and others for

PhenoDB

Ada Hamosh and François Schiettecatte for GeneMatcher

The CMGs and especially the Baylor-Hopkins CMG team