Upload
dennis-roberts
View
214
Download
0
Tags:
Embed Size (px)
Citation preview
GenomeVIP:
A Genomics Analysis Pipeline for Cloud Computing with
Germline and Somatic Calling on Amazon’s Cloud
R. Jay Mashl
October 20, 2014
Turnkey Variant AnalysisProject
tvap.genome.wustl.edu
Pindel
• Multi-tool Variant
discovery
• Cloud computing
• Scalability
• Extensibility
VarScanBreakDancer GenomeSTRiP
Provides a collection of analysis tools and computational frameworks for streamlined discovery and interpretation of genetic variants
localCloud(AWS)
Genome Variant Investigation Portal
Poster #1678M (Monday)
Genome Variant Investigation Portal
• Web server and interface for germline and somatic variant-discovery tools
• Concurrent pipelines (SNV, indel, SV) with parallelization
• Launchable on local machines or on the cloud through Amazon Web
Services (AWS)
• Download results from AWS via web browser
Pindel
VarScan
BreakDancer
GenomeSTRiP(Harvard U.)
Heuristic/statistical calling of single nucleotide variants (SNVs)
Indel detection for paired reads based on local realignment
Structural variant (SV) detection for paired reads
Structural variant detection and genotyping
Biological Discoveries (selected)
Discovery & genotyping for structural variants in populations• ~14,000 deletion polymorphisms with allelic states (1000G pilot)• Nature Genetics 43, 269-276 (2011)
Comprehensive molecular portraits of human breast tumours• Identified four main types by combining data from five platforms• Nature 490, 61-70 (2012)
Genomic Landscape of Non-Small Cell Lung Cancer in Smokers and Never-Smokers
• Of patients with lung cancer, smokers found to have10x more mutations than non-smokers
• Cell 150, 1121-34 (2012)
Clonal evolution in relapsed acute myeloid leukaemia• “Cancer” consists of multiple variants; founding clone may give
rise to relapse clone; subclones may survive therapy and mutate further
• Nature 481, 506-510 (2012)
Application to APOL1: Demo
• Representative samples from PUR population from 1000 Genomes
• Analyze within the range chr22 : 36-37 Mbp for known variants:
Sample Region Variant Isoforms
HG01242 22:36,661,906 A / G G1(non-silent)
HG01101 22:36,662,041 AATAATT / A G2 (D6)
HG01049 22:36,133,448 D 767bp
Sample & Reference Selection
• Entering path: Copy the given URI. Click Retrieve.
• Click on all the PUR low_coverage items to transfer them to the Selected bams textbox.
• Select reference hs37d5.chr22.fa.
• Click Next.
Specify path & retrieve
Select sample
s
Select reference (hs37d5.chr22.f
a)
SNV Detection: VarScan
• CheckVarScan• Select Germline• Select SNVs only• Select All (pooled)
samples• Select User-defined
region and enter “22:36130000-
36700000”• Keep p-value: 0.99• Set Output vcf: True• Click Next.
SNV
All
22:36130000-
36700000
Indel Detection: Pindel
• Check Run Pindel• Select All (pooled)
samples• Select User-defined
region and enter “22:36130000-
36700000”• Click Next.
22:36130000-
36700000
Select All
SV Detection: BreakDancer
• Check BreakDancer• In Step 1, select All
(pooled) samples• In Step 3, select
Intra (ITX) only, user-defined region and enter “22:36130000-
36700000”• Click Next.
22:36130000-
36700000
SV Detection & Genotyping: GenomeSTRiP
1. Check Run GenomeSTRiP
2. Verify reference is
hs37d5.chr22.fa
3. Select mask
human_g1k_v37.mask.36.fasta
4. GC normalization: True, with
cn2_mask_g1k_v37.fasta
5. Chromosome: User-defined
with “22:36130000-36700000”
6. Variant size: 100bp – 100 kbp.
Hs37d5.chr2
2
100bp-
100kbp
Amazon AWS Submission
• Jobs have been tested to finish within a few minutes
Select machine
type
Where to send results
Validate & submit
Results
22 36133341 DEL_1 T <DEL> …SVLEN=-762;SVTYPE=DEL
22 36662041 . AATAATT A . PASS END=36662047;HOMLEN=4;HOMSEQ=ATAA;SVLEN=-6;SVTYPE=DEL;
22 36661906 . A G . PASS ADP=7;WT=1;HET=0;HOM=0;NC=2
22 36662041 . AATAATT A . PASS ADP=4;WT=0;HET=1;HOM=0;NC=2;
http://tvap.genome.wustl.edu/
Poster #1678 / M(this afternoon)
Jay Mashl(rmashl @
genome.wustl.edu)
Kai Ye(kye @
genome.wustl.edu)
Li Ding(lding @
genome.wustl.edu)
...and with thanks to the Ding Lab members
National HumanGenome ResearchInstitute