Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
Next Generation Sequencing at the
Marja Jakobs Core Facility Genomics
1953 Discovery of DNA structure
1961 Sequentional oligosynthesis possible
1977 Sanger sequencing method published (Frederick Sanger)
1983 Development of PCR
1985 Human Genome Project proposed $10 / base
1987 First automated sequencing machine: ABI 370
1990 Human Genome Project officially started
1996 Capillary sequencer: ABI 310
2000 Parallelized adapter / ligation mediated, bead-based sequencing technology
launching "Next Generation" sequencing
2003 Human Genome Project completed for $0.01 / base
2005 First Next Generation sequencing system: GS 20, 454 Life Sciences (100 bp)
2006 Solexa (later Illumina): Genome Analyser
2007 Applied Biosystems: SOLiD
2010 Semi Conductor Sequencing by Ion Torrent: PGM & Illumina: H1Seq
2011 Roche / 454: 700 bp readlength, 500 Mb output, Illumina MiSeq
2012 Ion Torrent: Ion Proton
2014 Illumina: NextSeq
2015 Oxford Nanopore: MinION, PromethION, Ion Torrent S5 / S5XL
Illumina: HiSeq 4000
2016 Illumina: MiniSeq, Human Genome $0,002 / 1000 basen
Dideoxynucleotides and radioactive labeled fragments
Tim
elin
e D
NA
se
qu
en
cin
g
1953 Discovery of DNA structure
1961 Sequentional oligosynthesis possible
1977 Sanger sequencing method published
1983 Development of PCR
1985 Human Genome Project proposed $10 / base
1987 First automated sequencing machine: ABI 370
1990 Human Genome Project officially started
1996 Capillary sequencer: ABI 310
2000 Parallelized adapter / ligation mediated, bead-based sequencing technology
launching "Next Generation" sequencing
2003 Human Genome Project completed for $0.01 / base
2005 First Next Generation sequencing system: GS 20, 454 Life Sciences (100 bp)
2006 Solexa (later Illumina): Genome Analyser
2007 Applied Biosystems: SOLiD
2010 Semi Conductor Sequencing by Ion Torrent: PGM & Illumina: H1Seq
2011 Roche / 454: 700 bp readlength, 500 Mb output, Illumina MiSeq
2012 Ion Torrent: Ion Proton
2014 Illumina: NextSeq
2015 Oxford Nanopore: MinION, PromethION, Ion Torrent S5 / S5XL
Illumina: HiSeq 4000
2016 Illumina: MiniSeq, Human Genome $0,002 / 1000 basen
fl
Fluorescent dyes
A
T
G
C
ACCGTA
ACCGT
ACCG
ACC
AC
A
T
T
G
C
C
A
A
Tim
elin
e D
NA
se
qu
en
cin
g
1953 Discovery of DNA structure
1961 Sequentional oligosynthesis possible
1977 Sanger sequencing method published
1983 Development of PCR
1985 Human Genome Project proposed $10 / base
1987 First automated sequencing machine: ABI 370
1990 Human Genome Project officially started
1996 Capillary sequencer: ABI 310
2000 Parallelized adapter / ligation mediated, bead-based sequencing technology
launching "Next Generation" sequencing
2003 Human Genome Project completed for $0.01 / base
2005 First Next Generation sequencing system: GS 20, 454 Life Sciences (100 bp)
2006 Solexa (later Illumina): Genome Analyser
2007 Applied Biosystems: SOLiD
2010 Semi Conductor Sequencing by Ion Torrent: PGM & Illumina: H1Seq
2011 Roche / 454: 700 bp readlength, 500 Mb output, Illumina MiSeq
2012 Ion Torrent: Ion Proton
2014 Illumina: NextSeq
2015 Oxford Nanopore: MinION, PromethION, Ion Torrent S5 / S5XL
Illumina: HiSeq 4000
2016 Illumina: MiniSeq, Human Genome $0,002 / 1000 basen
Tim
elin
e D
NA
se
qu
en
cin
g
1953 Discovery of DNA structure
1961 Sequentional oligosynthesis possible
1977 Sanger sequencing method published
1983 Development of PCR
1985 Human Genome Project proposed $10 / base
1987 First automated sequencing machine: ABI 370
1990 Human Genome Project officially started
1996 Capillary sequencer: ABI 310
2000 Parallelized adapter / ligation mediated, bead-based sequencing technology
launching "Next Generation" sequencing
2003 Human Genome Project completed for $0.01 / base
2005 First Next Generation sequencing system: GS 20, 454 Life Sciences (100 bp)
2006 Solexa (later Illumina): Genome Analyser
2007 Applied Biosystems: SOLiD
2010 Semi Conductor Sequencing by Ion Torrent: PGM & Illumina: H1Seq
2011 Roche / 454: 700 bp readlength, 500 Mb output, Illumina MiSeq
2012 Ion Torrent: Ion Proton
2014 Illumina: NextSeq
2015 Oxford Nanopore: MinION, PromethION, Ion Torrent S5 / S5XL
Illumina: HiSeq 4000
2016 Illumina: MiniSeq, Human Genome $0,002 / 1000 basen
Tim
elin
e D
NA
se
qu
en
cin
g
Tim
elin
e D
NA
se
qu
en
cin
g 1953 Discovery of DNA structure
1961 Sequentional oligosynthesis possible
1977 Sanger sequencing method published
1983 Development of PCR
1985 Human Genome Project proposed $10 / base
1987 First automated sequencing machine: ABI 370
1990 Human Genome Project officially started
1996 Capillary sequencer: ABI 310
2000 Parallelized adapter / ligation mediated, bead-based sequencing technology
launching "Next Generation" sequencing
2003 Human Genome Project completed for $0.01 / base
2005 First Next Generation sequencing system: GS 20, 454 Life Sciences (100 bp)
2006 Solexa (later Illumina): Genome Analyser
2007 Applied Biosystems: SOLiD
2010 Semi Conductor Sequencing by Ion Torrent: PGM & Illumina: H1Seq
2011 Roche / 454: 700 bp readlength, 500 Mb output, Illumina MiSeq
2012 Ion Torrent: Ion Proton
2014 Illumina: NextSeq
2015 Oxford Nanopore: MinION, PromethION, Ion Torrent S5 / S5XL
Illumina: HiSeq 4000
2016 Illumina: MiniSeq, Human Genome $0,002 / 1000 basen NovaSeq
Next Generation Sequencing
Mass Parallel Sequencing of unique DNA molecules
Main NGS platforms: • Roche 454 • SOLiD 5500 / Wildfire • Illumina • Ion Torrent • Nanopore • PacBio
CONVENTIONAL
one sample
one tube
one reaction
one result
MPS
Pool of molecules
one reaction vessel
many reactions
many results
Differences in Sequencing Strategies
• Single molecule vs amplified DNA
• Sequencing by synthesis
• Sequencing with terminators
Several strategies
• SMRT sequencing
Detection of incorporation of a single molecule (PAC BIO)
• Nanopore sequencing
Pull a lineair DNA strand through a nanopore (NanoPore)
• Amplify a single molecule to obtain a pool of identical molecules
Emulsion PCR (IonTorrent)
Polony formation (Illumina)
Two approaches
Single molecule Amplified DNA
Single Molecule Real Time sequencing (PacBio)
SMRTbell library
Single Molecule sequencing (Oxford Nanopore)
Amplified DNA
Emulsion PCR (IonTorrent)
Polony formation (Illumina)
Single strand • No amplification • No bias • No copying erros • Long reads
- Very low signal
- High error rate (14%)
Amplified DNA • Strong signals • PCR errors are averaged
- PCR bias/errors
- Short reads - Loose modifications
Single vs Amplified
2002
2005-2014
2006
2010-2014
2014
2011 2007-2013
2011
2012
2015
2011-2014
2015
2016
2017 NovaSeq
Next Generation Sequencing platforms CFG
• Ion Torrent
PGM (Personal Genome Machine)
Proton
• Illumina
MiSeq
HiSeq 4000 (VU)
High-throughput sequencing
Sanger
ABI
Ion
Torrent Illumina
PGM /
PI MiSeq
HiSeq
4000
Run time
sequencer
96
samples /
hour
5 / 2.5
hours
20 – 35
hours 1-3 days
Through
put
5 *105 x
500 bp
250 * 106
bp / year!
30Mb –
2Gb
per run
6 Gb
per run
81-94 Gb
(650-750 Gb)
Per lane
(per flowcell)
sequence
length 500 bp
200 - 400
bp 2x 150 bp 2x 150 bp
Analysis
time
15 min
sample 1 day - months
Workflow Amplified Single Molecule Sequencing
Library preparation
Emulsion PCR ‘Polony’ PCR on a slide
Semiconductor sequencing (Ion Torrent)
Sequencing by synthesis (Illumina)
Data Analysis
Library preparation
Library preparation
platform specific adapter ligation / primers to DNA fragments
20
A B
DNA fragment
barcode
Barcoding > multiplex advantages
Libraries ePCR Enrichment Deposition
1
2
3
4
5
6
7
8
No
barcode
1
2
3
4
5
6
7
8
9
10
barcode
21
Workflow Amplified Single Molecule Sequencing
Library preparation
Emulsion PCR ‘Polony’ PCR on a slide
Semiconductor sequencing (Ion Torrent)
Sequencing by synthesis (Illumina)
Data Analysis
P5
Rd 1 Seq primer
Sequence of intererest
Index Seq Primer
P7
INDEX
Rd 2 Seq Primer
Illumina Library
GCTGATCAG…
GGGGGGGCG…
Illumina: Sequencing by terminators
MiSeq HiSeq
Flow Cell Compartment
optics
Reagent compartment
optics
Touch screen monitor
Reagent compartment
Syringe pumps
Flow Cell Compartment
Keyboard & Mouse tray
1 flowcell, 1 lane, ~25M reads 2 flowcells, 8 lanes each, 300-400M reads / lane
Workflow Amplified Single Molecule Sequencing
Library preparation
Emulsion PCR ‘Polony’ PCR on a slide
Semiconductor sequencing (Ion Torrent)
Sequencing by synthesis (Illumina)
Data Analysis
Ion OneTouch 2 System Ion Chef
Emulsie PCR
Emulsion PCR
clonal amplification of DNA fragments
Create
“Water-in-oil”
emulsion
Mix DNA Library
& capture beads
(1 cpb)
+ PCR Reagents
+ Emulsion Oil
Perform emulsion PCR
library DNA
A
B
Micro-reactors
Annealing of
sequencing
primer
“Break micro-
reactors”
enrich for DNA-
positive beads
Emulsion PCR
Preferably 1cpb (clonal amplification)
Also possible: • ≥ 2 cpb
• 1 copy, ≥ 2 beads
29
Possible outcome emulsion PCR
multicopy beads
duplicate reads
clonal amplification
Workflow Amplified Single Molecule Sequencing
Library preparation
Emulsion PCR ‘Polony’ PCR on a slide
Semiconductor sequencing (Ion Torrent)
Sequencing by synthesis (Illumina)
Data Analysis
Semiconductor Sequencing chip
Ion Torrent: Semiconductor sequencing
Next Gen Sequencing
PPi
Fast (real time) Direct Detection
DNA Ions Sequence
– Nucleotides flow sequentially over Ion semiconductor
chip
– One sensor per well per sequencing reaction
– Direct detection of natural DNA extension, no camera’s
– Millions of sequencing reactions per chip
– Fast detection, fast cycle time, real time detection
Sensor Plate
Silicon Substrate Drain Source Bulk
dNTP
To column
receiver
∆ pH
∆ Q
∆ V
Sensing Layer
H+
Sequencing Workflow
Flow Order
1-mer
2-mer
3-mer
4-mer TACG
The signal strength is proportional to the number of nucleotides incorporated
Key: TCAG for signal calibration and normalization
TTCTGCGAA
35
Key sequence
Ion Proton™ Sequencer PersonalGenomeMachine
PGM Chip chamber
Chip
38
Data analysis Simplified Next Generation Sequencing
Impossible to assemble manually
Same dataset,
different parameters
Genome sequencing:
• De novo sequencing
• Metagenome
• Whole Genome
• Targeted resequencing
Amplicon panels Capture panels
Applications Next Generation Sequencing
selected gene panels, mutation analysis(SNPs, low frequent
mutations etc), structural variation (deletions,
insertions, inversions, CNVs), whole exome sequencing, mitochondrial sequencing
RNA sequencing
Epigenome sequencing
Single Cell Sequencing
• •
De Novo Sequencing
VIDISCA-454 De novo virus discovery
Michel de Vries
Cloning in TA vector Colony-PCR
Sequencing of colony-PCR products
Next Generation Sequencing Ion Torrent PGM sequencing
Mixed bacterial genomes in the sample
DNA cut into fragments
Sequencing Alignment of reads to
reference bacterial genome Relative abundance of bacterial species
sample
Metagenome Sequencing
Targeted resequencing
Multiplex Amplification
Coverage distribution (after optimization)
2017 Artec, Olaf Mook
Compare reads with reference
2017 Artec, Olaf Mook
Applications Next Generation Sequencing Genome sequencing:
RNA sequencing: • Transcriptome • Expression • Alternative splicing • miRNA • ncRNA • ..
Epigenome sequencing
Single Cell Sequencing
• •
More to RNA than messenger RNA
Transcriptome
Non-coding
>80%
Protein
-coding
<3%
Genome mRNA Protein
Small non-coding RNA <300 nt
Long non-coding RNA >200 nt miRNA (18-24 nt) Post-transcriptional gene regulation
piRNA (26-31 nt) Germ line transposon silencing
snRNA (¬150nt) pre-mRNA splicing
snoRNA (60-300 nt) RNA modification, rRNA processing
scaRNA RNA modification
Y RNA DNA replication, small RNA maturation
vRNA (88-98nt)
siRNA (21-23nt)
Circular RNA
(circRNAs) (100- >4000 nt)
Transcriptional RNA
tRNA (70-100nt) Translation
rRNA (120-4700nt) Translation
Repetitive DNA / RNA
Interspread repeats (SINEs,LINEs)
Processed pseudogenes
Simple sequence repeats
Segmental duplication
Blocks of tandem repeats
2017 Artec, Brendon Scicluna
Relative abundance of RNA species
Rela
tive a
bu
ndan
ce (%
)
80
4
13
1 2
2017 Artec, Brendon Scicluna
Relative abundance of RNA species
Rela
tive a
bu
ndan
ce (%
)
80
4
13
1 2
RIN (RNA integrity number)
2017 Artec, Brendon Scicluna
Relative abundance of RNA species
Rela
tive a
bu
ndan
ce (%
)
80
4
13
1 2
RIN (RNA integrity number)
2017 Artec, Brendon Scicluna
Relative abundance of RNA species
Rela
tive a
bu
ndan
ce (%
)
80
4
13
1 2
RIN (RNA integrity number)
2017 Artec, Brendon Scicluna
Approaches for RNA sequencing
Gene expression Target poly(A) mRNAs (enrich or selectively amplify).
Alternative splicing Target exon/intron boundaries by either doing long
read sequencing (>300 bp) or paired end read
sequencing (≥ 2 × 100).
miRNA (small RNAs) Target short reads using size selection purification
because miRNAs are in the 18–23 bp range.
Non-coding RNA Directional RNA sequencing is critical (strand-
specific)
Anti-sense RNA Consider combining mRNA expression with
directional RNA sequencing
Single cell RNA Requires whole transcriptome amplification.
Critical challenge is the technical noise created by
amplification.
2017 Artec, Brendon Scicluna
Preparing RNA for next generation sequencing
The core steps in preparing RNA for NGS analysis are: converting target to double-stranded DNA cDNA fragmenting and/or sizing the target sequences to a desired length
attaching oligonucleotide adapters to the ends of target fragments
quantitating the final library product for sequencing
2017 Artec, Brendon Scicluna
PCR and/or Amplification bias
HTSeq counts human mouse
Comparison of RNA library prep kits: • Kapa mRNA Hyper prep kit
• Kapa total RNA Hyper prep kit
with RiboErase
• Ovation RNA Seq System v2 i.c.w. Ovation Ultralow Library System v2
2017 Jakobs & Jongejans
Concluding remarks
RNA samples contain many types of RNA molecules, not only mRNA.
Ribosomal RNA is the predominant biotype
RNA samples represent molecules at various stages in different
ratio’s of the RNA life cycle RINs > 6.0 are OK
Know your library preparation kits!
Applications Next Generation Sequencing
Genome sequencing:
RNA sequencing:
Epigenome sequencing
• Histon modification and composition • Chromatin accessibility • Chromatin interaction
Single Cell Sequencing
• •
• Stable heritable traits that cannot be explained by changes in DNA sequence
• Describes gene regulation processes that drive gene expression in relation to (cell) differentiation and development
Epigenetics
57
Genetics
Environment
Epigenetics (complex) disease
2017 Artec, Peter Henneman
Epigenetics: bridge between “nature” and “nurture”
Epigenetic modifications
58 2017 Artec, Peter Henneman
1
2
3
DNA and histon modification go hand in hand. Regulation of gene expression often involves complex chromatin interactions. Altogether this is called the Epigenome
Epigenetics and NGS
• DNA-methylation
• Detecting Histon modifications eg CHIPseq
• Detecting chromatin accessibility eg
• Detecting chromatin interaction eg HiC
ATAC, Mnase, Dnase, FAIRE
Applications Next Generation Sequencing
Genome sequencing:
RNA sequencing:
Epigenome sequencing
Single Cell Sequencing
• •
Yong Wang, Nicholas E. Navin
Advances and Applications of Single-Cell Sequencing Technologies
Mol Cell, Volume 58, Issue 4, 2015, 598–609
Figure 3. Broad Applications of SCS in Biological and Biomedical ResearchPanels illustrating the diverse fields of biology that have
been impacted by SCS technologies over the past 5 years. Image credits: neurobiology, Zeynep Saygin (Cell Picture Show); germli...
Yong Wang, Nicholas E. Navin
Advances and Applications of Single-Cell Sequencing Technologies
Mol Cell, Volume 58, Issue 4, 2015, 598–609
Single Cell biology impacts most areas of research
By Kierano - Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=63202666
Individual cells behave differently from the average of many cells
Pooled mRNA analysis is misleading
Pooled cell data can be misleading
Level of detail
V Svensson, R Vento-Tormo, SA Teichmann https://arxiv.org/abs/1704.01379
Scaling Single Cell Transcriptomics
Conditions Single Cell partitioning systems
• Single cell suspension at high enough concentration
No aggregates / clumping
No doublets
o FACS
o Dnase / trypsin treatment
o Cell strainer
• As little manipulation time as possible
Avoiding pertubation of transcriptomal profiles in RNA expression
• Nanoliter protocols
technical and cost efficient
Flowsorting
384 plate filled with lysis buffer
Reversed Transcription cDNA Library preparation Sequencing
integrated fluidic circuits
C1 Single cell capture, Fluidigm
C1_fluidigm.mp4
Reverse transcription • Conversion of RNA to cDNA
Amplification
• Amplification of cDNA to usable amounts
Barcoding & Indexing
• Labeling library sources Library prep & Analysis
• Prepare sample for sequencing • Data output type
Components of RNA seq method
nL rxn
μL rxn
1 cell 1 cell 1 cell
2 cells 1 cell 1 cell
Result cDNA synthese C1
Chromium Single Cell Solution 10x Genomics
microfluidics
Droplet encapsulation
GEM
10x_Genomics_MMshort.mp4
Yields / efficiency
FACS: 50% - 60% informative wells
C1: 50% - 70% capture efficiency
Chromium: 50% - 60% informative cells
Dependent on cell type and viability
FACS microfluidics
Protocol example
SMART-seq2
MATQ-seq
MARS-seq
CEL-seq
C1
DROP-seq
InDrop
Chromium
Seq-well
SPLIT-seq (SMARTer)
Transcript data
Full lenght Full lenght 3’end counting
3’end counting
Full lenght 3’end counting
3’end counting
3’end counting
3’end counting
3’end counting
Platform Plate-based Plate-based Plate-based Plate-based Microfluidics Droplet Droplet Droplet Nanowell array
Plate-based
Throughput (♯cells)
102-103 102-103 102-103 102-103 102-103 103-104 103-104 103-104 103-104 103-105
Typical read depth (per cell)
106 106 104-105 104-105 106 104-105 104-105 104-105 104-105 104
Reaction volume
microliter microliter microliter nanoliter nanoliter nanoliter nanoliter nanoliter nanoliter microliter
Brief overview of scRNA-seq approaches
Haque et al. Genome Medicine (2017) 9:75
Works for both research and diagnostics
• Detection of alternative splicing
• Expression
• Receptor rearrangement
• Virus discovery
• De novo sequencing
• Target capture
• Exomes
• Whole genomes
• HT sequencing of large genes
• chimerism detection
• Single Cell genomics / transcriptomics
Conclusions next gen sequencing
Next challenge in NGS
• Bioinformatics – Data analysis – Tracking and tracing – Interpretation
• Data storage – New developments in analysis software
• Logistics – pre- post PCR laboratories – Data and sample tracking
The future of medical genetics 1000 dollar genome once in a life time test? Interpretation of data Whole genome analysis might replace exome sequencing (routine now) Identification of genes for many recessive diseases
multidiciplinary
Future applications
Whole Genome
Whole Exome
Targeted resequencing
Epigenome,
Conformation
LDC, FFPE / Frozen tissue : : Split-Seq
Acknowledgements
Core Facility Genomics Linda Koster Suzan Kenter Ferry de Klein
Clinical Genetics Martin Haagmans Olaf Mook
Clinical Genetics Frank Baas
Bioinformatics Laboratory Aldo Jongejan Perry Moerland
library prep € 80 - €125
MiSeq PE 150 25 *10^6 reads per run € 1.300
HiSeq PE 150 350*10^6 reads per laan € 2.623
SR 50 350*10^6 reads per laan € 1.108
PGM 200 bp 5*10^6 reads per 318 chip € 860
400 bp 5*10^6 reads per 318 chip € 950
Proton 200 bp 70*10^6 reads per chip € 1.070
C1 capture & RNA isolatie & cDNA synthese € 1.109,41 per IFC (96 captures)
library prep € 1.211,07 per IFC (96 captures)
Chromium 3’ lib prep (SC) ~ € 1.900 Per library
DNA ~€400 Per library
Structural Variation
miRNA sequencing
Called vs reference homopolymers
Unique Molecular Identifiers
• Uses large numbers (>105) of tags in the intial RT primer
• Each transcript from a given gene is primed with a unique barcode sequence
• Unique combination of tag and transcript can be used to identify individual RNA
molecules
• Chances of two transcripts receiving the same tag are effectively zero
Reverse transcription Template switch
•Poly A or random priming
•Generates full length cDNA
C1 SMARTer
cDNA
Library Preparation