16
Genomic Variant Data Collection, Sharing and Curation Heidi L. Rehm, PhD, FACMG

Genomic Variant Data Collection, Sharing and Curation · Genomic Variant Data Collection, Sharing and Curation ... necessary to conclusively determine the clinical significance

  • Upload
    hadien

  • View
    221

  • Download
    0

Embed Size (px)

Citation preview

Genomic Variant Data

Collection, Sharing and Curation

Heidi L. Rehm, PhD, FACMG

How Many Databases Do We Need?

NVA Form: Google, PubMed and LSDB Searches

How Many Databases Do We Need?

• Ideally one but that is not likely:

• Clinical lab vs. shared

• Commercial vs. non-profit

• General vs. locus-specific

• National vs. international

• Somatic vs. germline

• Sequence vs. copy number

• High-throughput use vs. manual use

• Clinical grade vs. research

Likely need to

support both

separately with

interfaces

Likely can

integrate

We have chosen to support NCBI’s ClinVar

database as the central site for data collection

dbGaP

ClinVar dbSNP

<50 bp

dbVAR

>50 bp

Curated variant calls, phenotypes and

interpretation

Variant calls,

genotype, phenotype

and sequence data

Public Access

Controlled Access

Opt-Out or Consented

CMA, WES, WGS

Non-identifiable datasets

Large variant

datasets

Intra-laboratory

Evidence-based review

Practice guidelines

Expert Curation

Single-Source Curation

Uncurated

Multi-Source Curation

Guideline

Inter-laboratory

By marking variants by level of curation, can mix research/clinical, curated/uncurated

Commercial

Networks

Lab H

Lab I

Lab G

ClinVar at NCBI

HVP Country Nodes

Lab B

Lab D Lab C

Lab A

Domain-Specific Databases LOVD-based (e.g. InSiGHT) Non-LOVD (e.g. PharmGKB)

Domain-Specific Curation (e.g. ISCA-JIRA)

A Database Ecosystem

Lab E

Lab F

Commercial

Curation

Commercial

Databases

(e.g. HGMD)

Commercial Software Tools

Commercial Software Tools

LOVD

PharmGKB InSiGHT OMIM

Patient Registries

Sequencing Laboratories Which Have Agreed to Share Data

Ackerman Lab, Mayo Alfred I Dupont Hospital for Children All Children's Hospital St. Petersburg Ambry Laboratories ARUP Athena Diagnostics Baylor Medical Genetic Laboratories Boston Children's Hospital Boston University Children's Hospital of Philadelphia Children's Mercy Hospital, Kansas City Cincinnati Children's Hospital City of Hope Molecular Diagnostic Lab CureCMD Denver Genetic Laboratories Detroit Medical Center Emory University Fullerton Genetics Laboratory GeneDx Cleveland Clinic Greenwood Genetics Harvard-Partners Lab for Molec. Medicine Henry Ford Hospital Huntington Medical Research Institutes

Illumina Clinical Services Lab Indiana University/Perdue University InSiGHT LabCorp / Integrated Genetics / Correlagen Masonic Medical Research Laboratory Mayo Clinic Mt. Sinai School of Medicine Nationwide Children's Hospital Nemours Biomolecular Core, Jefferson Medical Oregon Health Sciences University Providence Sacred Heart Medical Center Quest Diagnostics SickKids Molecular Genetic Laboratory Transgenomics University of Chicago University of Michigan University of Nebraska Medical Center University of Oklahoma University of Penn University of Sydney University of Washington Women and Children's Hospital Wayne State University School of Medicine Yale University

Lab Result

Variant

Annotation

Variant

classification

Published + in house data

Segregation studies

Population frequency

Amino acid conservation

Predictions: PolyPhen, SIFT, etc

Splicing predictions

Likely

Benign VUS Likely

Pathogenic Pathogenic Benign

• Family Testing

• Additional Info

Clinical Data

Custom

knowledge

Clinical Report

Variant Assessment and Classification

Courtesy of Birgit Funke

What data do we capture?

Automated

Annotations

Evidence-Based Assessment

• Initial analysis derives structured annotations for high-throughput filtering

approaches (e.g. variant prioritization using population frequencies and in silico

tools) and automated classifications (e.g. benign high-frequency variants)

• Manually curated evidence should be added as structured data to enhance

automated analysis (e.g. segregations, results of functional studies, biallelic

observations, phenotype associations, inheritance)

• Need to also curate at the gene level (phenotypes, modes of inheritance, types

of pathogenic mutations, domains of importance, functional pathways)

• Conclusions from manual data review are tracked as text-based descriptions to

enable logic to be followed, reviewed and enhanced over time

However, one of the probands had another pathogenic HCM variant on the same copy of the

gene which segregated with all 8 affected family members (Wang 2009). Although segregation in

3 family members was observed in one other family, an additional 5 individuals had the variant

without disease including three over age 70 (Liu 2005). Our laboratory has observed this variant

in one HCM proband and one DCM proband, neither with a family history of disease, out of over

3500 cases tested (1/215 Asian probands). Across all published and internal studies, this leads to

a cumulative allele frequency of 1% (7/652) in Asian HCM probands or 0.1% (8/7848) across all

probands. This variant has been observed at a frequency of 0.3% (7/2177) in the 1000 Genomes

project with a sub-population frequency of 1.5% (6/388) in the Chinese population. Computational

analyses (biochemical amino acid properties, conservation, AlignGVGD, PolyPhen2, and SIFT)

suggest that the Ala26Val variant is less likely to impact the protein, particularly given the lack of

conservation of the alanine residue in mammals (horse has an aspartic acid) and minimal

biochemical change of the alanine to valine substitution. In summary, although additional data is

necessary to conclusively determine the clinical significance of this variant, based upon the

higher frequency in a race-matched control population (1.5% vs. 1%), the absence of statistically

significant segregation data, the lack of a predictive effect from computational algorithms,

observations in both HCM and DCM which have different mutational mechanisms, and presence

on the background of another pathogenic mutation, this variant is more likely benign.

Documenting Logic

The Ala26Val variant has been reported in 10 HCM probands of Asian descent and was absent

from 832 race-matched control chromosomes (Konno 2005, Liu 2005, Song 2005, Wang 2009).

Documenting Arguments

Phenotypic Data Collection

• Challenges: Limited clinical data is collected

during routine clinical testing

• Opportunities:

• Support physician data entry

• Extract data from EHR

• Interface with patients

Patient Registry

Patient Portal

Add phenotype data

Manage contact preferences

Research Portal

Recruit patients for research Physician Portal

Submit phenotypes

Laboratory Portal

Submit consented cases

Examples of patient registries integrating genotype and phenotype data:

DuchenneConnect, CureCMD, Simons VIP

Patient Registry

The Role of the Curator

• Maintain gene and disease-level data

• Solicit variant and case data from laboratories

• Map lab terminologies to a standard

• Resolve unmappable variants

• Identify inconsistencies in the data (pathogenic

variants at high frequencies)

• Identify conflicting variant interpretations and

resolve where possible

• Queue controversial variants for discussion with

expert consensus group

• Liaison with researchers for variant studies

Diseases Noonan Syndrome, Cardio-Facio-Cutaneous Syndrome, LEOPARD Syndrome, Costello Syndrome

Clinical synopsis Facial dysmorphology, short stature, cardiac defects, motor delay, bleeding diathesis. Autosomal dominant or de novo inheritance

Clinical utility of testing Disease management; family planning

Minimum gene set PTPN11, SOS1, RAF1, KRAS, SHOC2, BRAF, MAP2K1, MAP2K2, HRAS

Existing LSDB SOS1 LOVD database (39 variants)

Project director Sherri Bale (GeneDx)

Expert curators Sherri Bale (GeneDx), Bruce Gelb (Mt. Sinai School of Med), Marco Tartaglia (Italian network for RASopathies), Amy Roberts (Harvard)

Curation staff members Brad Williams (GeneDx), Lisa Vincent (GeneDx)

Patient advocacy groups Noonan Syndrome Support Group (Wanda Robinson - see letter of support)

Contributing labs Baylor, Athena, Greenwood, Boston University, Emory, Children’s Hospital of Boston, U of Oklahoma, Harvard-Partners, ARUP, GeneDx, Nationwide Children’s, Nemours/Alfred I Dupont, Mt Sinai School of Medicine

Cases to contribute >7400 (includes 10 of the above labs)

Variants to contribute >7700 variant observations (includes 10 of the above labs)

Phenotyping approaches Clinical lab data submission (retrospective as well as improved collection with standardized form use)

Acknowledgements

U41 Grant

David Ledbetter

Christa Martin

Joyce Mitchell

Robert Nussbaum

Erin Riggs

Erin Kaminsky

Andy Faucett

Sherri Bale

Madhuri Hegde

Patrick Willems

Elaine Lyon

Soma Das

Matt Ferber

Sandy Aronson

David Miller

Mike Murray

Donna Maglott

Deanna Church

Organizations

NCBI

HVP

HGVS/LOVD

OMIM

GeneReviews

NHGRI

ACMG

CAP

AMP

ASHG

Genetic Alliance

UNIQUE

Patient CrossRoads

Databases

PharmGKB/PGRN

InSiGHT

MSeqDB

Labs that have agreed

to support this project

And many others…….

ICCG Annual Meeting May 9-10, 2013

Bethesda Marriott, Pooks Hill

Bethesda, MD

www.iscaconsortium.org