12
Project plan for generating a somatic data truth set for NGS cancer assay validation: COLO-829 and fusion spike-in materials Stephanie J.K. Pond 8/15/13

Aug2013 tumor normal whole genome sequencing

Embed Size (px)

Citation preview

Page 1: Aug2013 tumor normal whole genome sequencing

Project plan for generating a somatic data truth set for NGS cancer assay validation: COLO-829 and fusion spike-in materials

Stephanie J.K. Pond8/15/13

Page 2: Aug2013 tumor normal whole genome sequencing

2

There is a need for development and widespread adoption of standards to facilitate tool development and assay validation for next-gen sequencing in cancer applications.

– Cancer standards are needed for somatic calls for SNVs, indels, structural variants, copy number variation, and RNA fusion detection.

There is limited publicly available data that can act as a “gold standard” dataset.

We embarked on a multi-lab collaboration to generate a set of somatic calls that can be used as a truth dataset for validations and evaluating assay performance

– In this initial work, we are excluding FFPE samples

Introduction

Page 3: Aug2013 tumor normal whole genome sequencing

3

Cell lines have been previously sequenced and somatic calls from the DNA were published.

– Pleasance et al. Nature 2010, 463(7278): 191-196.

– Found variants in the major categories of SNVs, indels, CNVs, SV that need to be investigated for cancer applications

– Substitutions, insertions, deletions were confirmed by capillary sequencing

– Structural variants were confirmed by PCR across the breakpoint and capillary sequencing

– Confirmations in both cell lines to confirm somatic vs. germline variants.

We want to expand on this dataset.

COLO-829, COLO-829BL Cell Lines

Cancer Type Tissue Source Name ATCC No. Tissue source Name ATCC No

Melanoma; malignant skin COLO 829 CRL-1974 B lymphoblast COLO 829BL CRL-1980

Circos from COSMIC database

Page 4: Aug2013 tumor normal whole genome sequencing

4

Whole genome sequencing of COLO-829 and COLO-829BL at a depth of 90x is being generated to build a set of consensus calls:

– TGen HiSeq 2500 Multiple variant callers Cell passage A

– TGen Samples sent for sequencing to Complete Genomics

to incorporate an orthogonal technology

– Illumina HiSeq 2500 Cell passage B

The consensus of the datasets will establish a set of somatic calls that can be used as a gold standard in analytical validations

– expand the set in the literature– a second set of lower confidence somatic calls (2/3

datasets) may also be identified

Whole Genome Sequencing of COLO-829 and COLO-829BL

Consen-sus calls

TGen (Complete Genomics)

TGen (HiSeq)

ILMN (HiSeq)

Page 5: Aug2013 tumor normal whole genome sequencing

5

Synthetic Oligo Spike-In mRNA Transcripts

T7 AscI GeneA GeneB NotI T3(rc)ID Genes

Transcript Length (excluding poly A+)

TFG01 EWS-ATF1 1150TFG02 TMPRSS2-ETV1 1282TFG03 EWS-FLI1 1483TFG04 NTRK3-ETV6 1954TFG05 CD74-ROS1 2164TFG06 HOOK3-RET 2383TFG07 EML4-ALK 3442TFG08 AKAP9-BRAF 4531TFG09 BCR-ABL N/A*TFG10 BRD4-NUT 3969

*IDT could not synthesize TFG09 due to significant secondary structure

• 9 fusion gene sequences of clinically relevant gene fusions were pulled from GeneBank and were synthesized as DNA plasmids by IDT.

• Reverse transcription of the purified plasmid, followed by poly-A tailing, resulted in mRNA transcripts of known sequence.

• These constructs can be used as spike-in control materials in mRNA protocols to assess the ability to detect fusion genes, a critical mutation type in cancer.

Page 6: Aug2013 tumor normal whole genome sequencing

6

Pool of fusion spikes was added to COLO-829 total RNA at different concentrations.

Data shows a linear response at higher concentrations, and poor detection below a threshold value.

One spike (TMPRSS2-ETV1) is not detected, even at the highest concentrations, although it is present at very high read counts

– Hypothesis is that the fusion is near the 5’ end of the transcript, and breakpoint position is affecting fusion calling (remains to be tested)

– Highlights the need for standard materials in this area

Preliminary tests of the synthetic oligos appear promising

-14 -13 -12 -11 -10 -9 -8 -7 -60

1

2

3

4

5

6

TopHat-Fusion

ChimeraScan

SnowShoes

Fusion spike RNA concentration (log10 nmoles) Su

pp

ort

ing

ev

ide

nc

e s

tre

ng

th (

log

10

re

ad

co

un

ts)

Page 7: Aug2013 tumor normal whole genome sequencing

7

Whole-Genome

TGen –

HiSeq 2500

TGen – Complete Genomics

ILMN -

HiSeq 2500

Exomes SNVs

0:100% N:T• Replicate 1• Replicate 2• Replicate 3

50:50• Replicate 1• Replicate 2• Replicate 3

75:25• Replicate 1• Replicate 2• Replicate 3

90:10• Replicate 1• Replicate 2• Replicate 3

95:5• Replicate 1• Replicate 2• Replicate 3

99:1• Replicate 1• Replicate 2• Replicate 3

100:0• Replicate 1• Replicate 2• Replicate 3

WGS Large Insert

Structural Variants

0:100% N:T• Replicate 1• Replicate 2• Replicate 3

50:50• Replicate 1• Replicate 2• Replicate 3

75:25• Replicate 1• Replicate 2• Replicate 3

90:10• Replicate 1• Replicate 2• Replicate 3

95:5• Replicate 1• Replicate 2• Replicate 3

99:1• Replicate 1• Replicate 2• Replicate 3

100:0• Replicate 1• Replicate 2• Replicate 3

RNA Diff. Exp. Fusions

Tumor• Replicate 1• Replicate 2• Replicate 3

Tumor ERCC 1• Replicate 1• Replicate 2• Replicate 3

Tumor ERCC 2• Replicate 1• Replicate 2• Replicate 3

Normal • Replicate 1• Replicate 2• Replicate 3

Norm ERCC 1• Replicate 1• Replicate 2• Replicate 3

Norm ERCC 2• Replicate 1• Replicate 2• Replicate 3

Fusion spikes•Replicate 1•Replicate 2•Replicate 3

Arrays Copy Number

Expression

Agilent

Illumina

Affymetrix

Analytical Validation at TGen

50+ Flow cells6 TB of sequencing dataEquiv ~600 Exomes (TCGA Phase 1)

Page 8: Aug2013 tumor normal whole genome sequencing

8

TGen and ILMN have begun a cross-site effort to generate a “gold standard” somatic dataset for a pair of cancer cell lines (COLO-829 & COLO-829BL) as well as a set of synthetic mRNA fusion transcripts.

Data generation is scheduled to be completed this month, analysis thereafter.

We intend to make the data publicly available.

Are these appropriate reference materials?– Cell lines:

Stability Consent

– Fusion materials: Preliminary data is encouraging. Additional experiments are on-going.

We welcome feedback and discussion.

Summary

Page 9: Aug2013 tumor normal whole genome sequencing

9

Acknowledgements

Illumina– Han-Yu Chuang– Nancy Kim– Timothy McDaniel– Valerie Montel– Jimmy Perrott

Tgen– Stephanie Buchholtz– John Carpten– David Craig– Winnie Liang– W. Amol Tembe– Tracey White

Page 10: Aug2013 tumor normal whole genome sequencing

10

Appendix

Page 11: Aug2013 tumor normal whole genome sequencing

11

Page 12: Aug2013 tumor normal whole genome sequencing

12