Upload
human-variome-project
View
77
Download
0
Embed Size (px)
Citation preview
Phenotype-based Matching Using PhenoDB Terms in BHCMG PhenoDB to Maximize Whole Exome/Genome Data Interpretation
Nara Sobreira, MD, PhD Johns Hopkins University
McKusick-Nathans Institute of Genetic Medicine
http://genematcher.org
GeneMatcher overview
Intended to find other patients/animal models for a novel candidate disease gene
Only deidentified data and genes, so no IRB required Automated matching Submitters choose to follow up at their discretion Now also matching on phenotypic features (since
October 1st 2105)
GeneMatcher Matching options
As of May 1st 2016: 4,459 genes 1,675 submitters 55 countries 5,267 matches
on 1,216 genes
Growth in number of genes and matches in GeneMatcher
Dec.
1st,
2013
Jan.
1st,
2014
Feb.
1st,
2014
March 1st
, 2014
April
1st,
2014
May
1st,
2014
June
1st,
2014
July
1st,
2014
Aug.
1st,
2014
Sept. 1st
, 2014
Oct.
1st,
2014
Nov.
1st,
2014
Dec.
1st,
2014
Jan.
1st,
2015
Feb.
1st,
2015
March 1st
, 2015
April
1st,
2015
May
1st,
2015
June
1st,
2015
July
1st,
2015
Aug.
1st,
2015
Sept. 1st
, 2015
Oct.
1st,
2015
Nov.
1st,
2015
Dec.
1st,
2015
Jan.
1st,
2016
Feb.
1st,
2016
March 1st
, 2016
April
1st,
2016
May
1st,
2016
0
1500
3000
4500
6000
Gene Count Match Count
Matchmaker Exchange Matching options
As of May 1st 2016: 100 matches with
PhenomeCentral 87 matches with
DECIPHER
Hum Mut 34:561, 2013
Hum Mut 36:425, 2015
http://phenodbresearch.net OR http://phenodb.org
BHCMG PhenoDB numbers Holds data on 4,426 submissions
Including 53 cohorts ranging from 5-295
More than 6,225 samples have been sequenced by BHCMG Holds phenotype data from more than 10,284 individuals BHCMG has identified more than 222 novel genes More than 231 known genes and 136 phenotypic expansion
From Robinson PN, Kohler S, Bauer S, Seelow D, Horn D, Mundlos S. The Human Phenotype Ontology: a tool for annotating and analyzing human hereditary disease. Am J Hum Genet. Nov 2008;83(5):610-615.
From Robinson PN, Kohler S, Bauer S, Seelow D, Horn D, Mundlos S. The Human Phenotype Ontology: a tool for annotating and analyzing human hereditary disease. Am J Hum Genet. Nov 2008;83(5):610-615.
Phenotype Matching Algorithms- General Approach
Testing set - 44 published cases with known Mendelian phenotypes and detailed phenotypic descriptions
Question: Can the algorithms match query cases of a known syndrome to other cases with same diagnosis in the testing set?
Algorithm Validation
Defined test set Picked phenotype to be tested, remove all cases of this
phenotype from the testing set Picked a case with the testing phenotype as a query case and a
case to be put back into testing set Applied the matching algorithm Is testing case in top 1 or top 5 most similar cases? Repeat x 1000
Pairs-Based Testing Approach
Percent of Cases For Which the Best Phenotypic Match From the Database Has the Same Syndrome
SimUI Jaccard Distance WangResnick-PhenoDB
Resnick-OMIM
SimGIC-PhenoDB
SimGIC-
OMIM PhenoDigm
Congenital Disorder of Deglysolyation
1 1 0.87 1 1 1 1 1 1
Floating-Harbor Syndrome
1 1 1 1 1 1 1 1 1
Poretti-Boltshauser Syndrome
1 1 1 1 1 1 1 1 1
Cerebrocosto-mandibular Syndrome
0.98 0.63 0.57 0.53 0.25 0.25 0.86 0.84 0.46
BHCMG PhenoDB database use Buske et al. Hum Mutat, 2015 Oct. Removed all cases with fewer than 5 phenotypic features Removed all phenotypes for which only one case was present in
database N=1,152 cases across 32 phenotypes Ran “Top 1” and “Top 5” Pairs-Based Test
Fraction of Cases for Which the Matching Case is in Top 5 Most Similar Cases
“Real-World” Algorithm Testing n=4,114 Wide range of depth phenotypic annotation depth Many cases without assigned OMIM syndromes ID
BHCMG PhenoDB database use
How Well Does a Randomly Selected Query Case Match to Other Cases of Same Clinical Syndrome?
Top 5 Top 25 Top 1st %ile Top 5th %ile
Gomez-Lopez-Hernandez Syndrome (N=6) 2/5 2/5 2/5 4/5Hemifacial Microsomia (N=13) 1/12 2/12 2/12 8/12Lateral Meningocele Syndrome (N=6) 0/5 0/5 0/5 0/5
What Factors Impact Successful Phenotypic Matching?
Phenotypic Features per Case Top 5 Top 25
Top 1st %ile
Top 5th %ile
Gomez-Lopez-Hernandez Syndrome (N=6) 7 2/5 2/5 2/5 4/5Hemifacial Microsomia (N=13) 8 1/12 2/12 2/12 8/12Lateral Meningocele Syndrome (N=6) 1 0/5 0/5 0/5 0/5
As a user of a phenotype matching algorithm, how far “down the list” would you need to go to find relevant matches?
Removed cases with fewer than 5 features
Threshold Testing
Threshold Testing
Algorithms perform best for patients/syndromes with rare and highly specific phenotypic annotations
Depth of phenotypic annotation is key Inherent limitations to reducing a patient with a Mendelian
disorder to a list of phenotypic terms Phenotypic matching in combination with genomic data (e.g. a
VCF file) may offer opportunities for gene discovery
Preliminary Conclusions and Next Steps
Thanks for your attention!Acknowledgements
Joel Krier and François Schiettecatte for the phenotype-matching
project
Ada Hamosh, François Schiettecatte, Corinne Boehm, Julie
Hoover-Fong, Reid Sutton, Jim Lupski, David Valle and others for
PhenoDB
Ada Hamosh and François Schiettecatte for GeneMatcher
The CMGs and especially the Baylor-Hopkins CMG team