17
Variant Calling Practical Workshop Bioinformatics Tools Dr Richard D. Bagnall [email protected]

Variant Calling Workshop: Bioinformatics Tools

Embed Size (px)

Citation preview

Page 1: Variant Calling Workshop: Bioinformatics Tools

Variant Calling Practical Workshop

Bioinformatics Tools

Dr Richard D. Bagnall

[email protected]  

Page 2: Variant Calling Workshop: Bioinformatics Tools

Genetic testing Report (VCGS)

mRNA transcript

NM_000257.2(MYH7):c.1208G>A

Protein

(NP_000248.2(MYH7):p.Arg403Gln)

Page 3: Variant Calling Workshop: Bioinformatics Tools

Chromosome

(NC_000014.8:g.23898487C>T) (chr14:g.23898487C>T)

mRNA transcript

(NM_000257.2:c.1208G>A)

Protein

(NP_000248.2:p.Arg403Gln) (MYH7 Arg403Gln) MYBPC3 R542Q

Ensembl Mutalyzer

Seattle Seq Annotation

Conversion between variant annotation levels

Mutalyzer: https://mutalyzer.nl/position-converter

Seattle Seq Annotation: http://snp.gs.washington.edu/SeattleSeqAnnotation138/

Ensembl: http://www.ensembl.org/index.html

Page 4: Variant Calling Workshop: Bioinformatics Tools

Mutalyzer conversion between mRNA and DNA

Step  1:  select  genome  

Step  2:  enter  variant                            (DNA  or  mRNA)  

Step  3:  convert  

Page 5: Variant Calling Workshop: Bioinformatics Tools

Variant annotation

Annotation class Characteristic Source Feature name

Inheritance pattern Number of meiosis, obligate carriers, de

novo Family pedigree N/A

Mutation consequence

Missense, premature termination, gross deletion, splice site

Seattle Seq Annotation functionGVS

Mutation location alter initiation codon,

last nucleotide of exon, functional domain

Seattle Seq Annotation

proteinPosition, distance to splice

Conservation Nucleotide

conservation, protein conservation

Seattle Seq Annotation

scorePhastCons, consScoreGERP

Prediction of pathogenicity

In silico model prediction

Seattle Seq Annotation, Provean

polyPhen, scoreCADD,

granthamScore, SIFT, Provean

Frequency in general population

Frequency in general population

Seattle Seq Annotation, ExAc,

dbSNP genomesExAc, rsID

Occurrence in additional cases

Reported in another patient with same

disease

Clinvar, pubmed, in-house patients, google N/A

Page 6: Variant Calling Workshop: Bioinformatics Tools

Inheritance pattern

2  x  genotype  +  phenotype  +  1  x  genotype  +  phenotype  –  Pathogenic/likely  pathogenic  variant  may  be  useful  for  cascade  geneIc  tesIng  There  is  scope  for  addiIonal  clinical  tesIng  in  this  family  

Page 7: Variant Calling Workshop: Bioinformatics Tools

Seattle Seq Annotation tool

Step  1:  select  addiIonal                            opIons  

Step  2:  enter  variant  informaIon  

Step  3:  submit  

Page 8: Variant Calling Workshop: Bioinformatics Tools

Seattle Seq Annotation tool results

Annotation Result Annotation class chromosome 14 Descriptive

position 23898487 Descriptive referenceBase C Descriptive sampleAlleles C/T Descriptive

accession NM_000257.2 Descriptive cDNAPosition 1624 Descriptive

geneList MYH7 Descriptive functionGVS missense Mutation consequence aminoAcids ARG,GLN Mutation consequence

proteinPosition 403/1936 Mutation location distanceToSplice 50 Mutation location scorePhastCons 0.998 Conservation consScoreGERP 4.04 Conservation

scoreCADD 23.6 Prediction of pathogenicity polyPhen 1 Prediction of pathogenicity

granthamScore 43 Prediction of pathogenicity inDBSNPOrNot dbSNP_133 Frequency in general population

rsID rs121913624 Frequency in general population genomesESP C=13006 Frequency in general population genomesExAC unknown Frequency in general population

Page 9: Variant Calling Workshop: Bioinformatics Tools

Provean/SIFT Annotation tool

Step  1:  enter  variant  informaIon                            Comma  separated,  DNA  

Step  2:  submit  

Page 10: Variant Calling Workshop: Bioinformatics Tools

Provean/SIFT Annotation results

Variation ROW_NO. 1 INPUT 14,23898487,C,T

Protein sequence change

PROTEIN_ID ENSP00000347507

LENGTH 1935 STRAND -1

CODON_CHANGE CCT C[G/A]G GTG POS 403

RESIDUE_REF R RESIDUE_ALT Q

TYPE Single AA Change

provean prediction

SCORE -3.47 PREDICTION (cutoff=-2.5) Deleterious

#SEQ 542 #CLUSTER 30

SIFT prediction

SCORE 0 PREDICTION (cutoff=0.05) Damaging

MEDIAN_INFO 3.68 #SEQ 389

Annotation dbSNP_ID rs121913624

Page 11: Variant Calling Workshop: Bioinformatics Tools

Exome Aggregate Consortium Database (ExAc)

Step  1:  enter  gene  

Step  2:  scroll  down                              search  for  variant  

Step  3:  if  present,                            click  to  view                            populaIon                            frequency  

Page 12: Variant Calling Workshop: Bioinformatics Tools

ExAc results

Check  the  ethnic  specific  allele  frequency  

Page 13: Variant Calling Workshop: Bioinformatics Tools

Clinvar

Step  1:  enter  variant  

NC_000014.8:g.23898487C>T  NM_000257.2:c.1208G>A  NP_000248.2:p.Arg403Gln  MYH7 Arg403Gln  

Page 14: Variant Calling Workshop: Bioinformatics Tools

Clinvar result Step  1:  SupporIng  observaIons  

Step  2:  Read  summary  descripIons  

Other  sources:  Pubmed                                                          Google  Scholar                                                          Gene  specific  mutaIon  databases  (e.g.  LVOD)  

Page 15: Variant Calling Workshop: Bioinformatics Tools

Variant Classification Classification Category Criteria Exceptions Check

Pathogenic

A 1 needed

Confirmed de novo (in the setting of a new disease in the family)

Confirmed de novo alteration in a novel gene with possible disease implications ☐

Likely de novo alteration (i.e. paternity not confirmed with known disease alteration) ☐

Alterations resulting in premature truncation (e.g. reading frame shift, nonsense)

Truncation in close proximity to 3’ terminus ☐ LOF has not been established as a mechanism of pathogenicity (e.g. MYH7) ☐

Other ACMG defined mutation (e.g. initiation codon or gross deletion)

In frame gross deletion of a single exon not in a known protein functional domain ☐

Strong segregation with disease (LOD ≥ 3 = >10 meiosis) ☐

Functionally validated splice mutation In frame skipping a single exon not in a known protein function domain ☐

B 4 needed

Significant disease association in appropriately sized case-control studies ☐

Reported in another patient satisfying established diagnostic criteria for classic disease without a clear mutation.

Can count as 2 if in multiple cases and absent from controls ☐

Last nucleotide of exon ☐ Good segregation with disease (LOD 1.5-3 =5-9 meiosis) ☐ Deficient protein function in appropriate functional assay(s) ☐

Well characterised mutation at same position Can count as 2 if alteration impacts hotspot with 2 or more pathogenic mutations at same position

Other strong data supporting pathogenic classification ☐

Likely pathogenic

1 needed Alterations at the canonical donor/acceptor sites (+/- 1,2) without additional data in support of pathogenicity ☐

C Rarity in general population databases (minor allele frequency <0.1% or 1/1000)

Dependant on disease penetrance and inheritance (eg. HCM: 1/1000 alleles) ☐

4 needed In silico models in agreement AND/OR completely conserved in appropriate places Polyphen + SIFT for most genes ☐

Moderate segregation with disease (At least 3 informative meiosis) ☐

Other data supporting pathogenic classification e.g alterations in a well-defined functional

domain ☐

3 of B ☐ 2 of B and at least 1 of C ☐ 1 of B and at least 3 of C ☐

Page 16: Variant Calling Workshop: Bioinformatics Tools

Variant Classification

Classification Category Criteria Exceptions Check

VUS Insufficient or conflicting evidence ☐ Gross duplications without strong evidence for pathogenic or benign ☐

Likely benign

D Subpopulation frequency in support of benign classification ☐

1 needed Intact protein function observed in appropriate functional assay(s) ☐ Intronic alteration with no splicing impact by RT-PCR analysis or other

splicing assay ☐

Other strong data supporting benign classification ☐

Co-occurrence with mutation in same gene Genes without a defined, severe, biallelic phenotype ☐

E Co-occurrence with a mutation in another gene that explains phenotype ☐

2 needed In silico models in agreement benign ☐ Does not segregate with disease in family studies (genes with incomplete

penetrance) ☐

No disease associate in small case control study ☐ Other data supporting benign classification ☐

Benign

F General pop frequency is too high (based on disease prevalence and penetrance) ☐

1 needed Does not co-segregate with affected’s ☐ Internal frequency is too high to be a pathogenic mutation based on

disease prevalence and penetrance ☐

Seen in trans with a mutation or in homozygous state in individual without severe disease for that gene

Genes without a defined, severe, biallelic phenotype ☐

No disease association in appropriately sized case control studies ☐

1 of D and at least 2 of E ☐ 2 or more of D ☐ > 3 of E w/o conflicting data ☐ >4 of E without conflicting data ☐

Page 17: Variant Calling Workshop: Bioinformatics Tools

Summary

•  Converted between DNA, mRNA and protein nomenclature

•  Used on-line websites to annotate variants for functional impact on the protein

•  Searched a variant database to determine the allele frequency in the general population

•  Searched for reports of the variant in unrelated patients with the same disease

•  Used guidelines to record the pathogenicity classification