35
HORIZON DIAGNOSTICS Molecular QC: Interpreting your Bioinformatics Pipeline 25 th June 2015 Dr. Danielle Folkard and Dr. Alessandro Riccombeni

Molecular QC: Interpreting your Bioinformatics Pipeline

Embed Size (px)

Citation preview

HORIZON DIAGNOSTICS

Molecular QC: Interpreting your Bioinformatics Pipeline 25th June 2015

Dr. Danielle Folkard and Dr. Alessandro Riccombeni

2

What is the impact of assay failure in your laboratory and how do you monitor for it?

Research Use Only

3

External Quality Assessment

T790M &

L858R

E746_A

750del

Wild

type

Wild

type

E746_A

750del

T790M &

L858R

E746_A

750del

Wild

type

T790M &

L858R

T790M &

L858R

E746_A

750del

T790M &

L858R

E746_A

750del

Wild

type

Wild

type

Wild

type

Wild

type

G719S

T790M &

L858R

G719S0

5

10

15

20

25

30

35

40

EGFR Genotyping ErrorsExternal Quality Assessment 2014

EGFR Sample Tested

Perc

enta

ge o

f Inc

orre

ct R

esul

ts

European Molecular Quality Network (EMQN)

Research Use Only

4

Clinical Application of Next Generation Sequencing

Using just one sample, one workflow can test for mutation status across multiple genes

Research Use Only

5

The Sources of Variability in the Next Generation Sequencing Workflow

Research Use Only

6

The Sources of Variability in the Next Generation Sequencing Workflow

Research Use Only

7

Influence of Analytical Pipelines

Reference: Genome in a Bottle Consortium

Research Use Only

8

Introduction

2016

Four decades, three generations

1976Maxam-Gilbert

1977 Sanger φX174 genome

1983 PCR 1990 HGP starts (3B $) Pyrosequencing

2003 First human genome sequence

1987 First automated sequencer

ABI 370

2000 454 LS Corp.

2016 454 ends1998 Solexa2006 GA SOLiD

2005 Roche 454

1976

2008 RNA-Seq Helicos

2011 Ion Torrent MiSeq 2012 Helicos ends

2009 PacBio ON CG

LEGENDFirst generationSecond generation (NGS)Third generation

2014 HiSeqX

Research Use Only

9

Next-Generation Sequencing for Clinical Bioinformatics

• NGS revolutionised our access to genomic information

• 2nd generation technology allows WGS for less than 1000 GBP

• However, a number of challenges exist• Data creation• Data analysis/processing• Data (clinical) interpretation

Research Use Only

10

FFPE and NGS

• Why FFPE?

• FFPE has been used for a number of NGS applications• Tuononen, Spencer (targeted resequencing)• Fanelli, Gu (ChIP-seq, RRBS)• Weng, Meng (RNA-Seq, miRNA)

• What happens with fixatives?• Need to counteract protein-DNA interactions• Additional effects from tissue preparation,

paraffin embedding, x-linking, chemical modification of tissues

• Lower DNA yield, DNA degradation, smaller fragments

• How does FFPE affect NGS pipelines?

Spencer et al. 2013

Research Use Only

11

FFPE and NGS

• Schweiger 2009, comparison of FFPE and Fresh Frozen (FF) tissues for Illumina sequencing:• Fixation time does not significantly affect the quality of sequencing data from FFPE• Lower mappability• Higher mutation rate• Lower fraction of known SNVs

• Van Allen 2014: good correlation between FFPE and FF samples

Van Allen 2014

Research Use Only

12

FFPE and NGS

• Hedegaard 2014: 3 months storage resulted in less efficient DNA extraction• High fragmentation: loss of material• Decrease in library complexity• High increase in PCR duplicates, 60-

85% for FFPE vs. 30% for FF

• C > U deamination is a common cause of artifacts• U-tolerant polymerase didn’t help• Pattern, T <> C, A <> G transition

• The fraction of mapped reads decreases with storage time• Increase in partial mappings• Increase in gapped mappings

Hedegaard et al. 2014

Wong et al. 2014Research Use Only

13

FFPE and NGS

CTTTTT

CTTTTT

New mismatches: artifact variants

CTT

T

Lost mappingsPartial mappings

TTTTT

Artifacts include:• SNVs• Larger indels• CNV

Research Use Only

14

FFPE: Conclusions

• FFPE artifacts increase with storage time

• Artifacts go against the statistical power of your variant calling analysis

• Molecular reference standards help filter out bad mappings and spurious variants

• Bioinformatics pipelines allow adding Molecular Reference Standards in your joint variant calling pipeline

Research Use Only

15

Upcoming Webinar

Title:

Understanding and Controlling for Sample and Platform Biases in NGS Assays

Date:

Wednesday 22nd July 2015

Time:

4:00pm BST, 11:00am EST

Register now: www.horizondx.com/upcomingwebinar

Research Use Only

16

Genome in a Bottle

Infrastructure for performance assessment of NGS

No widely accepted set of metrics to characterize the

fidelity of variant calls from NGS

GIAB is developing standards to provide well characterized human genomes as Reference Materials

Tools and standardized methods to use these RMs

Research Use Only

17

Horizon Diagnostics: Ashkenazim Trio FFPE Reference Standards

• GM24385 – Ashkenazim PGP Son• Coriell: NA24385• NIST: HG002• PGP: huAA53E0

• GM24149 – Ashkenazim PGP Father• Coriell: NA24149• NIST: HG003• PGP: hu6E4515

• GM24143 – Ashkenazim PGP Mother• Coriell: NA24143• NIST: HG004• PGP: hu8E87A9

Research Use Only

18

Horizon Diagnostics: Ashkenazim Trio FFPE Reference Standards

• GM24385 – Ashkenazim PGP Son

• GM24149 – Ashkenazim PGP Father

• GM24143 – Ashkenazim PGP Mother

• Complete Genomics:• Small variants (SNPs & Indels)• Copy Number Variants• Structural Variants• Mobile Element Insertions

• National Institute of Standards and Technology (NIST):• Illumina HiSeq, BWA + GATK 1.6

SNPs, Indels, large SVs, CNVs• Illumina Mate Pair 6kb Insert: mappings• PacBio: raw data• Ion Torrent Exome: variants + mappings• BioNano: raw data + assemblies• Moleculo: mappings for Son and Father

• SNPs and Indels shared by 2+ technologies:• Complete Genomics, proprietary pipeline• Illumina HiSeq, BWA + GATK 1.6• TMLT Ion Proton, TAMP + TVC

Research Use Only

19

NIST preliminary analysis of Ashkenazim Trio

Run SNP & Indels

SV CNV Genomic VCF

Son 1 x x x x

2 x x x

Father 1 x x x x

3 x x x x

Mother 1 x x x x

2 x x x x

• GM24385 – Ashkenazim PGP Son

• GM24149 – Ashkenazim PGP Father

• GM24143 – Ashkenazim PGP Mother

• National Institute of Standards and Technology (NIST):• Illumina HiSeq, BWA + GATK 1.6

SNPs, Indels, large SVs, CNVs

Research Use Only

20

NIST preliminary analysis of Ashkenazim Trio

• GM24385 – Ashkenazim PGP Son

• GM24149 – Ashkenazim PGP Father

• GM24143 – Ashkenazim PGP Mother

• National Institute of Standards and Technology (NIST):• Illumina HiSeq, BWA + GATK 1.6

SNPs, Indels, large SVs, CNVs

00 Run SNP & Indels

SV CNV Genomic VCF

UCSC BED tracks

Son 1 x x x x x

2 x x x x

Father 1 x x x x x

3 x x x x x

Mother 1 x x x x x

2 x x x x x

Trio Merged x x x x x

Research Use Only

21

Merged Ashkenazim Variants

• GM24385 – Ashkenazim PGP Son

• GM24149 – Ashkenazim PGP Father

• GM24143 – Ashkenazim PGP Mother

• National Institute of Standards and Technology (NIST):• Illumina HiSeq, BWA + GATK 1.6

SNPs, Indels, large SVs, CNVs

Run SNP & Indels

SV CNV

Son 1 5637374 14785 381

2 5618495 0 358

Father 1 5575725 17091 377

3 5598533 17569 348

Mother 1 5709480 16851 385

2 5690410 17488 356

Trio Total 33830017 83784 2205

Trio Merged 8423146 53151 1100

Research Use Only

22

Filtered, Merged Ashkenazim Variants

• Horizon Diagnostics:• Annotation: snpEff + SnpSift (dbNSFP)

COSMIC ID dbSNP HGVS AA change HGVS codon change NCBI ClinVar (clinical significance) SIFT score (prob. damaging variant) phastCons 1000 score (site conservation) 1000 genomes p1 Allele Freq. (non-syn.)

SNP & Indels

SV CNV

Merged Trio 8423146 53151 1100

Filtered variants

32532 53151 1100

Mixed variants 1162 1352 0

HIGH impact 5169 1607 41

MOD. impact 68028 265 0

Ann. Effects 73236 175708 3220

• GM24385 – Ashkenazim PGP Son

• GM24149 – Ashkenazim PGP Father

• GM24143 – Ashkenazim PGP Mother

• National Institute of Standards and Technology (NIST):• Illumina HiSeq, BWA + GATK 1.6

SNPs, Indels, large SVs, CNVs

Research Use Only

23

Comparison of the Ashkenazim Trio Data

Research Use Only

24

GIAB: Conclusions

• Genome In A Bottle Reference Standards are invaluable for validating variant calling analysis

• NIST and its collaborators shared datasets created with most NGS technologies

• Horizon Diagnostics shared annotated, merged variant calls from NIST for the Ashkenazim Trio

• ~35K variants are predicted having high or moderate impact within the Trio

• GM24385 (Ashkenazim Son) includes 352 small variants with high/moderate impact which are absent in Father and Mother

• Filtered, annotated variants are available for download on horizondx.com

Research Use Only

25

Top NGS-Related Technical Enquiries

Research Use Only

26

“I would like to validate my NGS workflow. What is the application of your different Q-Seq products?”

Research Use Only

27

How to Test the Robustness and Sensitivity of your Workflow and Assay

StructuralMultiplex

DNA

Sample Complexity

SampleFeatures

Quantitative Multiplex

FFPE, DNA and Formalin-

Compromised DNA

Genome In A BottleFFPE

Gene-SpecificMultiplex

DNA and FFPE

Tru-QDNA

Research Use Only

28

“I would like to assess the effect of Formalin on my assay”

Research Use Only

29

Impact of Formalin Treatment on DNA

Research Use Only

30

Quantitative Multiplex Reference Standard as Formalin-Compromised DNA

Characterized fragmentation levels, DNA quantification, and defined allelic frequency

*These products are part of our early access program. It is the responsibility of the individual laboratory to determine expected results specific to its assay.

Genomic DNA Tapescreen assay

1 Ladder

2, 4 HD-C749 Reference Standard

3, 5 HD-C751 Reference Standard

[bp] 1 2 3 4 5

Research Use Only

31

Upcoming Webinar

Title:

Understanding and Controlling for Sample and Platform Biases in NGS Assays

Date:

Wednesday 22nd July 2015

Time:

4:00pm BST, 11:00am EST

Register now: www.horizondx.com/upcomingwebinar

Research Use Only

32

“I would like to assess my bioinformatics pipeline for detection of SNVs, Structural Variants and CNVs”

Research Use Only

33

Variant Type Mutation Expected Fractional Abundance (%) or CNV:

SNV High GC GNA11 Q209L 5.6SNV High GC AKT1 E17K 5.6SNV Low GC KRAS G13D 5.6SNV Low GC Pi3Ka E545K 5.6Long Insertion EGFR V769 ins 5.6

Long DeletionEGFR (delE746-A750)

5.3

Fusion ROS1 translocation 5.6

Fusion RET translocation 5.6

CNV MET amplification 4.5 x amplification

CNV MYC amplification 9.5 x amplification

SNP EGFR_G719S 5.3Short Deletion MET_p.V237fs 4.8*SNV High GC NOTCH1_p.P668S 5.0Short Deletion FLT3_p.S985fs 5.6Short Deletion BRCA2_p.A1689fs 5.6Short Deletion FBXW7_p.G667fs 5.6

* %AF lower due to MET amplification

Structural Multiplex Reference Standard

*This product is part of our early access program. It is the responsibility of the individual laboratory to determine expected results specific to its assay.

Research Use Only

34

Routinely monitor the performance of your workflows and assays with independent external controls

What extraction and quantification methods are you

using?

What is the limit of detection of your

workflow?

Is the impact of formalin treatment interesting to you?

What is the impact of assay failure in your laboratory and how do you monitor for it?

Research Use Only

35

References

Slide 13 http://www.genome.gov/sequencingcosts/

Tuononen 2013, http://www.ncbi.nlm.nih.gov/pubmed/23362162

Spencer 2013, http://www.ncbi.nlm.nih.gov/pubmed/23810758

Fanelli 2010, http://www.ncbi.nlm.nih.gov/pubmed/21106756

Fanelli 2011, http://www.ncbi.nlm.nih.gov/pubmed/22082985

Gu 2011, http://www.ncbi.nlm.nih.gov/pubmed/?term=Preparation+of+reduced+representation+bisulfite+sequencing+libraries+for+genome-scale+DNA+methylation+profiling.

Gu 2010, http://www.ncbi.nlm.nih.gov/pubmed/20062050

Weng 2010, http://www.ncbi.nlm.nih.gov/pubmed/20593407

Meng, 2013, http://www.ncbi.nlm.nih.gov/pubmed/?term=meng+2013+comparison+of+microrna+deep+sequencing

Slide 15 Schweiger 2009, http://www.ncbi.nlm.nih.gov/pubmed/?term=schweiger%5BAuthor%5D+AND+2009%5BDate+-+Publication%5D+ffpe

Van Allen 2014, http://www.ncbi.nlm.nih.gov/pubmed/24836576

Slide 16 Hedegaard 2014, http://www.ncbi.nlm.nih.gov/pubmed/24878701

Wong 2014, http://www.ncbi.nlm.nih.gov/pubmed/24885028

Slide 27 BWA: Li 2010, http://www.ncbi.nlm.nih.gov/pubmed/20080505

GATK: McKenna 2010, http://www.ncbi.nlm.nih.gov/pubmed/20644199

snpEff: Cingolani 2012, http://www.ncbi.nlm.nih.gov/pubmed/?term=22728672

SnpSift: Cingolani 2012, http://www.ncbi.nlm.nih.gov/pubmed/22435069

dbNSFP: Liu 2013, http://www.ncbi.nlm.nih.gov/pubmed/23843252

Research Use Only