Upload
gwenda-hoover
View
219
Download
4
Embed Size (px)
Citation preview
1
Cancer Sequencing Quality & The ICGC-TCGA DREAM
Somatic Mutation Calling Challenge
Dr. Paul C. Boutros
Ontario Institute for Cancer Research
July 14, 2015
2
The Consequences of Analytical Diversity
SMC-DNA: Challenge-Based Benchmarking
SMC-Het & SMC-RNA
Pathway
77
This holds for all tumour-types: breast cancer
Fox et al.2014
74% of genes in 1+16% of genes in 16+0% of genes in 21+
9
The Consequences of Analytical Diversity
SMC-DNA: Challenge-Based Benchmarking
SMC-Het & SMC-RNA
Pathway
11
DREAM Mutation-Calling Challenge
Real data:●10 T/N pairs (50x/30x)●Two tumour-types:●5 pancreatic●5 prostate●Lane-level FASTQs & BAMs
In silico data:●5 T/N pairs●For “play” and dry-runs●Releases of increasing complexity●Rapid scoring turn-around●BAMs (Novoalign or BWA)
Nov 2013 Aug 2014 Oct Nov
Validation WinnerCompetition
Dec Jan Feb Mar Apr May Jun Jul
1 2 3 4 5
1212
How Can You Get The Data?
Register for the Challenge at Synapse Complete an ICGC DACO Application
Download using Annai’s GeneTorrent No-cost to download
Directly access in the Google Compute Engine (Google cloud) $2,000 free computing
1313
Initial Results
• So Far:o 391 registrantso 3,260 entries on 14 genomes
• On-going post-challenge submissions as people try to understand the failures of their algorithms (a living benchmark!)
• Key discussions on scoring SVs and on improving BamSurgeon (the simulator)
2323
Where are we now?
Initial SNV analysis complete Ewing et al. in press Nature Methods
Initial SV analysis (of in silico tumours) in progress No-cost to download
Experimental validation studies nearly complete
24
The Consequences of Analytical Diversity
SMC-DNA: Challenge-Based Benchmarking
SMC-Het & SMC-RNA
Pathway
2525
So, What About Heterogeneity?
As part of TCGA-Prostate we were looking at normal cell contamination
We = Svitlana Tyekucheva, Syed Haider, Massimo Loda, Francesca Demichelis
We’d just take a consensus of estimators….
27
Opening for registration on November 10, 2014
Opening for submissions onAugust 2015 (ahem!)
https://www.synapse.org/#!Synapse:syn2813581
Lcchong, wikipedia
28
Single Sample Multi-Sample
• 50 samples• Simulated from GIAB and a deep-sequenced normal• Cloud-only (GCE+Galaxy) REB, distribution• Varying complexity, mutational load, depth, etc. • ~3 months run-time
• Sample number pending• Similar design, though• Cloud-based (Galaxy)• Similar parameter ranges• 3 months
SMC Tumour Heterogeneity Challenge
31
Start with a chr-BAM
Phase and create two ph-chr-BAM
Extract reads for normal & contamination
Spike SNVs, CNAs, GRs
Phase A Phase B
ContaminatingNormal
Sub-clone A
Sub-clone B
How Are We Going to Simulate?
32
Final BAM
SNV CallsCNA Calls
MuTect
Strelka
Battenburg
TITAN
Available via Google Cloud / Docker API
How Are We Going to Simulate?
34
1. Sub-populations characteristicsa) What is the level of normal “contamination”?b) How many sub-populations are present?c) What are their proportions?
2. What is the phylogenetic order of sub-populations?
3. For each mutation, what sub-populations is it in?
What Are We Scoring?
35
The Consequences of Analytical Diversity
SMC-DNA: Challenge-Based Benchmarking
SMC-Het & SMC-RNA
Pathway
3636
Dr. Robert Bristow Dr. John McPhersonDr. Theodore van der Kwast
CPC-GENE: The People Involved
Boutros LabRichard de BorjaNicholas HardingPablo Hennings-YeomansEmilie LalondeAmin ZiaJianxin WangFrancis NguyenNatalie FoxMichelle Chan-Seng-YueLauren ChongTakafumi YamaguchiVeronica Sabelnykova
Boutros LabRichard de BorjaNicholas HardingPablo Hennings-YeomansEmilie LalondeAmin ZiaJianxin WangFrancis NguyenNatalie FoxMichelle Chan-Seng-YueLauren ChongTakafumi YamaguchiVeronica Sabelnykova
InformaticsTimothy BeckFouad YousifRobert DenrocheXuemei Luo
InformaticsTimothy BeckFouad YousifRobert DenrocheXuemei Luo
GenomicsTaryne ChongAndrew BrownMichelle SamJeremy JohnsLee TimmsNicholas BuchnerAda Wong
GenomicsTaryne ChongAndrew BrownMichelle SamJeremy JohnsLee TimmsNicholas BuchnerAda Wong
Clinico-MolecularDominique TrudelAlice MengGaetano Zafarana
Clinico-MolecularDominique TrudelAlice MengGaetano Zafarana
PIs & PMsMichael FraserMelania PintilieNeil FleshnerLakshmi MuthuswamyColin CollinsThomas HudsonLincoln Stein
PIs & PMsMichael FraserMelania PintilieNeil FleshnerLakshmi MuthuswamyColin CollinsThomas HudsonLincoln Stein
37
SMC-DNA Organizing Team
Sage/DREAM Organizers
Gustavo Stolovitzky
Stephen Friend
Adam Margolin
Thea Norman
Christine Suver
Christopher Bare
Kristen Dang
Bruce Hoff
Mike Kellen
External Organizers
Paul Boutros (OICR)
Josh Stuart (UCSC)
Lincoln Stein (OICR)
Kyle Ellrott (UCSC)
Adam Ewing (UCSC)
Anna Lee (OICR)
Katie Houlahan (OICR)
Cristian Caloian (OICR)
Takafumi Yamaguchi (OICR)
Data Contributors: Funding/Sponsoring/Publication Partners Include:
38
Organizers• Paul Boutros (OICR)
• Josh Stuart (UCSC)
• Gustavo Stolovitzky (IBM)
• Stephan Friend (Sage)
• David Wedge (Sanger)
• Peter Van Loo (UCL)
• Quaid Morris (University of Toronto)
• Thea Norman (Sage)
Data Contributors Funding/Sponsoring/Publication Partners Include:
• Amit Deshwar (University of Toronto)
• Minjeong Ko (OICR)
• Kyle Ellrott (UCSC)
• Christopher Bare (Sage)
• Kristen Dang (Sage)
• Yin Hu (Sage)
• Shannon Carter (Sage)
SMC-Het Organizing Team