Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
NEXT GENERATION SEQUENCING II
R. PIAZZA (MD, PHD) DEPT OF MEDICINE AND SURGERY UNIVERSITY OF MILANO BICOCCA
DNA
+
R. Piazza – NGS Sequencing II
Capillary
Electrophoresis
5’ 3’
5’
3’
SANGER SEQUENCING
DNA Polymerase
Genomic DNA
DNA Library
~100bp ~100bp
Single-Read Paired-End
NEXT-GENERATION-SEQUENCING
R. Piazza – NGS Sequencing II
HIGH-THROUGHPUT SEQUENCING
The sequence is read in each cluster through multiple cycles of nucleotide incorporation
R. Piazza – NGS Sequencing II
NEXT-GENERATION-SEQUENCING
THROUGHPUT
SINGLE SEQUENCING RUN
6000 000 000 000 bp !!
R. Piazza – NGS Sequencing II
FASTQ
NGS ANALYSIS: FIRST STEPS
1 2 3 4
SEQUENCE
QUALITY (PHRED)
p = 1/100
Quality = -10 Log (1/100)
Quality = -10 Log 10-2
Quality = -10 * -2 = 20
p = 1/1000
Quality = -10 Log (1/1000)
Quality = -10 Log 10-3
Quality = -10 * -3 = 30
Quality = 40 P = 1/10000
R. Piazza – NGS Sequencing II
FASTQ
NGS ANALYSIS: FIRST STEPS
SEQUENCE
QUALITY (PHRED)
Number of FASTQ elements = 1000 000 000 000 / 100 = 10 billion FASTQ
100 bases per read -> 100 bytes
+
Quality = Sequence = 100 bytes
10 billion FASTQ elements x (100bytes + 100bytes + 50bytes)
+
Lines 1 + 3 = ~ 50bp
2250 billion bytes = 2250 Gigabytes = 2.25 Terabytes
R. Piazza – NGS Sequencing
1 2 3 4
FASTQ
NGS ANALYSIS: FIRST STEPS
1 2 3 4
SEQUENCE
QUALITY (PHRED)
BWA - http://bio-bwa.sourceforge.net/
BOWTIE - http://bowtie-bio.sourceforge.net/index.shtml
ALIGNMENT TO A REFERENCE
PROBLEM: A STANDARD NGS EXPERIMENT MAY
GENERATE HUNDREDS OF MILLIONS OF INDIVIDUAL
READS !!
BOWTIE2 - http://bowtie-bio.sourceforge.net/bowtie2/index.shtml
R. Piazza – NGS Sequencing II
ALIGNMENT ?
1) SAM (Sequence Alignment Map)
2) BAM (Binary Alignment Map) + BAI index file
Name Chromosome Position Sequence and Quality
SAMTOOLS (samtools.sourceforge.net/)
Li H et al., Bioinformatics. 2009 Aug 15;25(16):2078-9.
R. Piazza – NGS Sequencing II
SINGLE
NUCLEOTIDE
POLYMORPHISM
..CGGCATTGGGACAGACAACAACAGCACTTCTGACCAAGCGGAGAAGAGCT..
SOMATIC
VARIANT
CONTROL SAMPLE
CASE SAMPLE
SOMATIC MUTATION OR SNP ?
CGGCATTGGGACAGACAACAACAGAACTTCT GGCATTGGGACAGACAACAACAGAACTTCTG GGCATTGGGACAGACTACAACAGCACTTCTG GCATTGGGACAGACTACAACAGCACTTCTGA ATTGGGACAGACAACAACAGAACTTCTGAC
TGGGACAGACTACAACAGCACTTCTGACCA GGGACAGACAACAACAGAACTTCTGACCAA
GACAGACTACAACAGCACTTCTGACCAAGC
..CGGCATTGGGACAGACAACAACAGCACTTCTGACCAAGCGGAGAAGAGCT.. CGGCATTGGGACAGACTACAACAGCACTTCT GGCATTGGGACAGACAACAACAGCACTTCTG GGCATTGGGACAGACAACAACAGCACTTCTG GCATTGGGACAGACTACAACAGCACTTCTGA ATTGGGACAGACAACAACAGCACTTCTGAC
TGGGACAGACTACAACAGCACTTCTGACCA GGGACAGACAACAACAGCACTTCTGACCAA
GACAGACAACAACAGCACTTCTGACCAAGC
VARIANT CALLING
R. Piazza – NGS Sequencing II
SINGLE
NUCLEOTIDE
POLYMORPHISM
..CGGCATTGGGACAGACAACAACAGCACTTCTGACCAAGCGGAGAAGAGCT..
SOMATIC
VARIANT
CONTROL SAMPLE
CASE SAMPLE
SOMATIC MUTATION OR SNP ?
CGGCATTGGGACAGACAACAACAGAACTTCT GGCATTGGGACAGACAACAACAGAACTTCTG GGCATTGGGACAGACTACAACAGCACTTCTG GCATTGGGACAGACTACAACAGCACTTCTGA ATTGGGACAGACAACAACAGAACTTCTGAC
TGGGACAGACTACAACAGCACTTCTGACCA GGGACAGACAACAACAGAACTTCTGACCAA
GACAGACTACAACAGCACTTCTGACCAAGC
..CGGCATTGGGACAGACAACAACAGCACTTCTGACCAAGCGGAGAAGAGCT.. CGGCATTGGGACAGACTACAACAGCACTTCT GGCATTGGGACAGACAACAACAGCACTTCTG GGCATTGGGACAGACAACAACAGCACTTCTG GCATTGGGACAGACTACAACAGCACTTCTGA ATTGGGACAGACAACAACAGCACTTCTGAC
TGGGACAGACTACAACAGCACTTCTGACCA GGGACAGACAACAACAGCACTTCTGACCAA
GACAGACAACAACAGCACTTCTGACCAAGC
VARIANT CALLING
R. Piazza – NGS Sequencing II
Wilcoxon Signed-Rank test
Statistical module
Wilcoxon Signed-Rank
test Test statistic W
As sample size increases
(Nr > 10) the Z-Score
converges to a Gaussian
distribution!
Estimating the error function of the normal
distribution of W..
rN
i
i
control
i
case
i RxxW1
)()(sgn
25
5
4
4
3
3
2
211)( xetatatatataxerf
..using the Abramowitz and Stegun
approximation equation 7.1.26
R. Piazza – NGS Sequencing II
CO
NT
RO
L
CA
SE
T
A
A
A T
A T
LOSS OF HETEROZYGOSITY – ALLELIC IMBALANCE
R. Piazza – NGS Sequencing II
COMPARATIVE
EXONIC
QUANTIFICATION
ANALYZER
http://www.ngsbicocca.org/
BIOINFORMATICS – CEQer2
Piazza R. et al., PLoS One. 2013 Oct 4;8(10):e74825.
Piazza R. et al., Nat Genet. 2013 Jan;45(1):18-24.
Gambacorti C. et al., Blood. 2015 Jan 15;125(3):499-503.
Piazza R. et al., Nucleic Acids Res. 2012 Sep;40(16):e123.
Spinelli R. et al., Mol Genet Genomic Med. 2013 Nov;1(4):246-59.
Piazza R. et al., Nat Comm. 2018. In press.
NORMAL CHROMOSOMES: 1 MATERNAL, 1 PATERNAL
CBL
Chromosome break
OS = Oncosuppressor
MUTATION
HOMOLOGOUS REPAIR
R. Piazza – NGS Sequencing II
R. Piazza – NGS Sequencing II
BACTERIAL GENOME
Sequence Type: ST-1879
Organism: Klebsiella pneumoniae
MLST Profile: kpneumoniae
GENE % IDENTITY HSP Length Allele Length GAPS BEST MATCH
GAPA 100 237 450 0 GAPA_1
INFB 100 318 318 0 INFB_3
MDH 100 477 477 0 MDH_1
PGI 100 402 432 0 PGI_1
PHOE 100 179 420 0 PHOE_1
RPOB 100 501 501 0 RPOB_1
TONB 100 414 414 0 TONB_79
Bartual SG, Seifert H, Hippler C, Luzon MA, Wisplinghoff H, Rodriguez-Valera F. J Clin Microbiol 2005; 43:4382-90.
Griffiths D, Fawley W, Kachrimanidou M, et al. J Clin Microbiol 2010; 48:770-8.
Lemee L, Dhalluin A, Pestel-Caron M, Lemeland JF, Pons JL. J Clin Microbiol 2004; 42:2609-17.
Wirth T, Falush D, Lan R, et al. Mol Microbiol 2006; 60:1136-51.
Jaureguy F, Landraud L, Passet V, et al. BMC Genomics 2008; 9:560.
Larsen MV, Cosentino S, Rasmussen S, et al. J. Clin. Micobiol. 2012. 50(4): 1355-1361.
Resistance gene
Identity Query/HSP Contig Position in
contig Phenotype Accession no.
blaOXA-9 99.76 840/840 NODE_11694_length_1029_cov_574.73956
3 106..944 Beta-lactam resistance JF703130
blaLEN12 90.79 789/684 NODE_2323_length_681_cov_631.278992 18..701 Beta-lactam resistance AJ635406
aadA2 99.59 780/491 NODE_3745_length_470_cov_805.451050 1..490 Aminoglycoside resistance X68227
blaTEM-79 100 861/621 NODE_8636_length_601_cov_644.326111 1..621 Beta-lactam resistance AF190692
aph(3')-Ia 100 816/700 NODE_258_length_680_cov_479.180878 1..700 Aminoglycoside resistance V00359
mph(A) 99.72 906/704 NODE_10668_length_684_cov_417.179810 1..704 Macrolide resistance D16251
catA1 99.85 660/660 NODE_1982_length_1526_cov_495.138275 248..907 Phenicol resistance V00622
dfrA12 100 498/498 NODE_4437_length_917_cov_476.905121 416..913 Trimethoprim resistance AB571791
QnrS1 100 657/657 NODE_6327_length_1563_cov_355.095337 459..1115 Quinolone resistance AB187515
fosA 96.9 420/420 NODE_745_length_1522_cov_307.268066 45..464 Fosfomycin resistance NZ_AFBO01000747
BACTERIAL GENOME
R. Piazza – NGS Sequencing II
BacteriaFingerprint example showing simulated outbreak/epidemic data. HCE allows to track down the origin of the outbreak (Milan).
BACTERIAL GENOME
MICROBIOME
MICROBIOME: collection of genomes of microbes in a system
MICROBIOTA: collection of organisms that are present in a system
R. Piazza – NGS Sequencing II
MICROBIOME - UTILITY
R. Piazza – NGS Sequencing II
To track inflammatory bowel diseases such as Crohn’s or ulcerative colitis
Differences in gut microbial communities have been identified between individuals with non-alcoholic fatty liver disease (NAFLD) who had either mild to moderate or advanced liver fibrosis
Gut microbiome might be an important factor in a wide range of health issues like obesity, asthma, diabetes, cancer, autoimmune disorders and heart disease.
Chromosome Position Ref Var Gene Codon Change AA Change Var Type Polymorphism MAF Clinical Polymorphism
chr11 119148931 G A CBL TGT->TAT Cys384Tyr SNV 0
chr12 22811995 A G ETNK1 AAT->AGT Asn244Ser SNV rs370316713 -1 Non-Clinical
chr18 42531913 G A SETBP1 GGC->AGC Gly870Ser SNV rs267607040 -1 Clinical
chr19 57840143 T C ZNF543 TTT->TCT Phe438Ser SNV 0
chr20 31021250 C T ASXL1 CGA->TGA Arg416*; Arg417*;
Arg308* SNV rs375215583 -1 Non-Clinical
SNP Filtering
Gambacorti-Passerini C. et al., Blood. 2015 Jan 15;125(3):499-503.
Piazza R. et al., Nat Genet. 2013 Jan;45(1):18-24.
Hoischen A. et al., Nat Genet. 2010 Jun;42(6):483-5. doi: 10.1038/ng.581.
SNP FILTERING
R. Piazza – NGS Sequencing II