51
NEXT GENERATION SEQUENCING II R. PIAZZA (MD, PHD) DEPT OF MEDICINE AND SURGERY UNIVERSITY OF MILANO BICOCCA

NEXT GENERATION SEQUENCING II · NEXT GENERATION SEQUENCING II R. PIAZZA (MD, PHD) DEPT OF MEDICINE AND SURGERY UNIVERSITY OF MILANO BICOCCA . DNA + ... 106..944 Beta-lactam resistance

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

NEXT GENERATION SEQUENCING II

R. PIAZZA (MD, PHD) DEPT OF MEDICINE AND SURGERY UNIVERSITY OF MILANO BICOCCA

DNA

+

R. Piazza – NGS Sequencing II

Capillary

Electrophoresis

5’ 3’

5’

3’

SANGER SEQUENCING

DNA Polymerase

Flowcell

NEXT-GENERATION-SEQUENCING

R. Piazza – NGS Sequencing II

Genomic DNA

DNA Library

~100bp ~100bp

Single-Read Paired-End

NEXT-GENERATION-SEQUENCING

R. Piazza – NGS Sequencing II

R. Piazza – NGS Sequencing II

HIGH-THROUGHPUT SEQUENCING

The sequence is read in each cluster through multiple cycles of nucleotide incorporation

R. Piazza – NGS Sequencing II

NEXT-GENERATION-SEQUENCING

THROUGHPUT

SINGLE SEQUENCING RUN

6000 000 000 000 bp !!

R. Piazza – NGS Sequencing II

FASTQ

NGS ANALYSIS: FIRST STEPS

1 2 3 4

SEQUENCE

QUALITY (PHRED)

p = 1/100

Quality = -10 Log (1/100)

Quality = -10 Log 10-2

Quality = -10 * -2 = 20

p = 1/1000

Quality = -10 Log (1/1000)

Quality = -10 Log 10-3

Quality = -10 * -3 = 30

Quality = 40 P = 1/10000

R. Piazza – NGS Sequencing II

R. Piazza – NGS Sequencing II

FASTQ

NGS ANALYSIS: FIRST STEPS

SEQUENCE

QUALITY (PHRED)

Number of FASTQ elements = 1000 000 000 000 / 100 = 10 billion FASTQ

100 bases per read -> 100 bytes

+

Quality = Sequence = 100 bytes

10 billion FASTQ elements x (100bytes + 100bytes + 50bytes)

+

Lines 1 + 3 = ~ 50bp

2250 billion bytes = 2250 Gigabytes = 2.25 Terabytes

R. Piazza – NGS Sequencing

1 2 3 4

FASTQ

NGS ANALYSIS: FIRST STEPS

1 2 3 4

SEQUENCE

QUALITY (PHRED)

BWA - http://bio-bwa.sourceforge.net/

BOWTIE - http://bowtie-bio.sourceforge.net/index.shtml

ALIGNMENT TO A REFERENCE

PROBLEM: A STANDARD NGS EXPERIMENT MAY

GENERATE HUNDREDS OF MILLIONS OF INDIVIDUAL

READS !!

BOWTIE2 - http://bowtie-bio.sourceforge.net/bowtie2/index.shtml

R. Piazza – NGS Sequencing II

ALIGNMENT ?

1) SAM (Sequence Alignment Map)

2) BAM (Binary Alignment Map) + BAI index file

Name Chromosome Position Sequence and Quality

SAMTOOLS (samtools.sourceforge.net/)

Li H et al., Bioinformatics. 2009 Aug 15;25(16):2078-9.

R. Piazza – NGS Sequencing II

ALIGNMENT VIEWER - IGV

R. Piazza – NGS Sequencing II

SINGLE

NUCLEOTIDE

POLYMORPHISM

..CGGCATTGGGACAGACAACAACAGCACTTCTGACCAAGCGGAGAAGAGCT..

SOMATIC

VARIANT

CONTROL SAMPLE

CASE SAMPLE

SOMATIC MUTATION OR SNP ?

CGGCATTGGGACAGACAACAACAGAACTTCT GGCATTGGGACAGACAACAACAGAACTTCTG GGCATTGGGACAGACTACAACAGCACTTCTG GCATTGGGACAGACTACAACAGCACTTCTGA ATTGGGACAGACAACAACAGAACTTCTGAC

TGGGACAGACTACAACAGCACTTCTGACCA GGGACAGACAACAACAGAACTTCTGACCAA

GACAGACTACAACAGCACTTCTGACCAAGC

..CGGCATTGGGACAGACAACAACAGCACTTCTGACCAAGCGGAGAAGAGCT.. CGGCATTGGGACAGACTACAACAGCACTTCT GGCATTGGGACAGACAACAACAGCACTTCTG GGCATTGGGACAGACAACAACAGCACTTCTG GCATTGGGACAGACTACAACAGCACTTCTGA ATTGGGACAGACAACAACAGCACTTCTGAC

TGGGACAGACTACAACAGCACTTCTGACCA GGGACAGACAACAACAGCACTTCTGACCAA

GACAGACAACAACAGCACTTCTGACCAAGC

VARIANT CALLING

R. Piazza – NGS Sequencing II

SINGLE

NUCLEOTIDE

POLYMORPHISM

..CGGCATTGGGACAGACAACAACAGCACTTCTGACCAAGCGGAGAAGAGCT..

SOMATIC

VARIANT

CONTROL SAMPLE

CASE SAMPLE

SOMATIC MUTATION OR SNP ?

CGGCATTGGGACAGACAACAACAGAACTTCT GGCATTGGGACAGACAACAACAGAACTTCTG GGCATTGGGACAGACTACAACAGCACTTCTG GCATTGGGACAGACTACAACAGCACTTCTGA ATTGGGACAGACAACAACAGAACTTCTGAC

TGGGACAGACTACAACAGCACTTCTGACCA GGGACAGACAACAACAGAACTTCTGACCAA

GACAGACTACAACAGCACTTCTGACCAAGC

..CGGCATTGGGACAGACAACAACAGCACTTCTGACCAAGCGGAGAAGAGCT.. CGGCATTGGGACAGACTACAACAGCACTTCT GGCATTGGGACAGACAACAACAGCACTTCTG GGCATTGGGACAGACAACAACAGCACTTCTG GCATTGGGACAGACTACAACAGCACTTCTGA ATTGGGACAGACAACAACAGCACTTCTGAC

TGGGACAGACTACAACAGCACTTCTGACCA GGGACAGACAACAACAGCACTTCTGACCAA

GACAGACAACAACAGCACTTCTGACCAAGC

VARIANT CALLING

R. Piazza – NGS Sequencing II

SOMATIC VARIANT: DRIVER OR PASSENGER ?

R. Piazza – NGS Sequencing II

NGS GOES DIGITAL C

ASE

(TU

MO

R)

CO

NTR

OL

(GER

MLI

NE)

????

R. Piazza – NGS Sequencing II

CO

NT

RO

L

CA

SE

NGS GOES DIGITAL

R. Piazza – NGS Sequencing II

Case

Control

Re

ad c

ou

nt

Genomic position

R. Piazza – NGS Sequencing II

Wilcoxon Signed-Rank test

Statistical module

Wilcoxon Signed-Rank

test Test statistic W

As sample size increases

(Nr > 10) the Z-Score

converges to a Gaussian

distribution!

Estimating the error function of the normal

distribution of W..

rN

i

i

control

i

case

i RxxW1

)()(sgn

25

5

4

4

3

3

2

211)( xetatatatataxerf

..using the Abramowitz and Stegun

approximation equation 7.1.26

R. Piazza – NGS Sequencing II

SINGLE NUCLEOTIDE POLYMORPHISMS

http://atlasofscience.org

R. Piazza – NGS Sequencing II

CO

NT

RO

L

CA

SE

T

A

A

A T

A T

LOSS OF HETEROZYGOSITY – ALLELIC IMBALANCE

R. Piazza – NGS Sequencing II

COMPARATIVE

EXONIC

QUANTIFICATION

ANALYZER

http://www.ngsbicocca.org/

BIOINFORMATICS – CEQer2

Piazza R. et al., PLoS One. 2013 Oct 4;8(10):e74825.

Piazza R. et al., Nat Genet. 2013 Jan;45(1):18-24.

Gambacorti C. et al., Blood. 2015 Jan 15;125(3):499-503.

Piazza R. et al., Nucleic Acids Res. 2012 Sep;40(16):e123.

Spinelli R. et al., Mol Genet Genomic Med. 2013 Nov;1(4):246-59.

Piazza R. et al., Nat Comm. 2018. In press.

BIOINFORMATICS – CEQer2

R. Piazza – NGS Sequencing II

CM

L-C

P

CM

L-B

C

SO

LID

TU

MO

R

Chr17 TP53

BIOINFORMATICS – CEQer2

R. Piazza – NGS Sequencing II

BIOINFORMATICS – CEQer2

R. Piazza – NGS Sequencing II

BIOINFORMATICS – CEQer2

R. Piazza – NGS Sequencing II

BIOINFORMATICS – CEQer2

SOMATIC UNIPARENTAL DISOMY

R. Piazza – NGS Sequencing II

NORMAL CHROMOSOMES: 1 MATERNAL, 1 PATERNAL

CBL

Chromosome break

OS = Oncosuppressor

MUTATION

HOMOLOGOUS REPAIR

R. Piazza – NGS Sequencing II

BACTERIAL GENOME

R. Piazza – NGS Sequencing II

BACTERIAL GENOME

R. Piazza – NGS Sequencing II

R. Piazza – NGS Sequencing II

BACTERIAL GENOME

Sequence Type: ST-1879

Organism: Klebsiella pneumoniae

MLST Profile: kpneumoniae

GENE % IDENTITY HSP Length Allele Length GAPS BEST MATCH

GAPA 100 237 450 0 GAPA_1

INFB 100 318 318 0 INFB_3

MDH 100 477 477 0 MDH_1

PGI 100 402 432 0 PGI_1

PHOE 100 179 420 0 PHOE_1

RPOB 100 501 501 0 RPOB_1

TONB 100 414 414 0 TONB_79

Bartual SG, Seifert H, Hippler C, Luzon MA, Wisplinghoff H, Rodriguez-Valera F. J Clin Microbiol 2005; 43:4382-90.

Griffiths D, Fawley W, Kachrimanidou M, et al. J Clin Microbiol 2010; 48:770-8.

Lemee L, Dhalluin A, Pestel-Caron M, Lemeland JF, Pons JL. J Clin Microbiol 2004; 42:2609-17.

Wirth T, Falush D, Lan R, et al. Mol Microbiol 2006; 60:1136-51.

Jaureguy F, Landraud L, Passet V, et al. BMC Genomics 2008; 9:560.

Larsen MV, Cosentino S, Rasmussen S, et al. J. Clin. Micobiol. 2012. 50(4): 1355-1361.

Resistance gene

Identity Query/HSP Contig Position in

contig Phenotype Accession no.

blaOXA-9 99.76 840/840 NODE_11694_length_1029_cov_574.73956

3 106..944 Beta-lactam resistance JF703130

blaLEN12 90.79 789/684 NODE_2323_length_681_cov_631.278992 18..701 Beta-lactam resistance AJ635406

aadA2 99.59 780/491 NODE_3745_length_470_cov_805.451050 1..490 Aminoglycoside resistance X68227

blaTEM-79 100 861/621 NODE_8636_length_601_cov_644.326111 1..621 Beta-lactam resistance AF190692

aph(3')-Ia 100 816/700 NODE_258_length_680_cov_479.180878 1..700 Aminoglycoside resistance V00359

mph(A) 99.72 906/704 NODE_10668_length_684_cov_417.179810 1..704 Macrolide resistance D16251

catA1 99.85 660/660 NODE_1982_length_1526_cov_495.138275 248..907 Phenicol resistance V00622

dfrA12 100 498/498 NODE_4437_length_917_cov_476.905121 416..913 Trimethoprim resistance AB571791

QnrS1 100 657/657 NODE_6327_length_1563_cov_355.095337 459..1115 Quinolone resistance AB187515

fosA 96.9 420/420 NODE_745_length_1522_cov_307.268066 45..464 Fosfomycin resistance NZ_AFBO01000747

BACTERIAL GENOME

R. Piazza – NGS Sequencing II

BACTERIAL GENOME – HIERARCHICAL CLUSTERING

R. Piazza – NGS Sequencing II

R. Piazza – NGS Sequencing II

BACTERIAL GENOME

BACTERIAL GENOME – PCA/PCoA

R. Piazza – NGS Sequencing II

BacteriaFingerprint example showing simulated outbreak/epidemic data. HCE allows to track down the origin of the outbreak (Milan).

BACTERIAL GENOME

MICROBIOME

MICROBIOME: collection of genomes of microbes in a system

MICROBIOTA: collection of organisms that are present in a system

R. Piazza – NGS Sequencing II

MICROBIOME - UTILITY

R. Piazza – NGS Sequencing II

To track inflammatory bowel diseases such as Crohn’s or ulcerative colitis

Differences in gut microbial communities have been identified between individuals with non-alcoholic fatty liver disease (NAFLD) who had either mild to moderate or advanced liver fibrosis

Gut microbiome might be an important factor in a wide range of health issues like obesity, asthma, diabetes, cancer, autoimmune disorders and heart disease.

MICROBIOME - UTILITY

R. Piazza – NGS Sequencing II

Genomic DNA

DNA Library

NEXT-GENERATION-SEQUENCING

R. Piazza – NGS Sequencing II

MICROBIOME

R. Piazza – NGS Sequencing II

MICROBIOME - RESULTS

MICROBIOME - RESULTS

R. Piazza – NGS Sequencing II

BacteriaFingerprint simulated outbreak

THANK YOU FOR YOUR ATTENTION!

R. Piazza – NGS Sequencing II

FASTQ

NGS ANALYSIS: FIRST STEPS

1 2 3 4

SEQUENCE

QUALITY (PHRED)

Chromosome Position Ref Var Gene Codon Change AA Change Var Type Polymorphism MAF Clinical Polymorphism

chr11 119148931 G A CBL TGT->TAT Cys384Tyr SNV 0

chr12 22811995 A G ETNK1 AAT->AGT Asn244Ser SNV rs370316713 -1 Non-Clinical

chr18 42531913 G A SETBP1 GGC->AGC Gly870Ser SNV rs267607040 -1 Clinical

chr19 57840143 T C ZNF543 TTT->TCT Phe438Ser SNV 0

chr20 31021250 C T ASXL1 CGA->TGA Arg416*; Arg417*;

Arg308* SNV rs375215583 -1 Non-Clinical

SNP Filtering

Gambacorti-Passerini C. et al., Blood. 2015 Jan 15;125(3):499-503.

Piazza R. et al., Nat Genet. 2013 Jan;45(1):18-24.

Hoischen A. et al., Nat Genet. 2010 Jun;42(6):483-5. doi: 10.1038/ng.581.

SNP FILTERING

R. Piazza – NGS Sequencing II

SNP FILTERING

R. Piazza – NGS Sequencing II