Upload
others
View
9
Download
0
Embed Size (px)
Citation preview
NGS: The Basics Sample Prep & Sequencing for Metagenomics
Stefan J. Green, PhD, Sequencing Core, Research Resources Center University of Illinois at Chicago – [email protected]
UIC SQC © ESCMID eLibrary b
y author
Disclosure of speaker’s interests
(Potential) conflict of interest
Potentially relevant company relationships in
connection with event 1
Sponsorship or research funding2
Fee or other (financial) payment3
Shareholder4
Other relationship, i.e. …5
None
None
None
Please note that the use of images of specific products in this presentation does NOT represent an endorsement of that product. There are many competing instruments and reagents of high quality that are available. © ESCMID eLibrary b
y author
Sample Prep & Sequencing for Metagenomics
© ESCMID eLibrary by a
uthor
Sample Prep & Sequencing for Metagenomics • DNA-based protocols
• Shotgun metagenome sequencing
• RNA-based protocols • Whole transcriptome sequencing
© ESCMID eLibrary by a
uthor
The process…
• Define question
• Acquire samples
• Preserve samples for future, if possible
• Label samples properly; use barcoded tubes
• Extract nucleic acids and perform QC
• Prepare NGS libraries and perform QC
• Sequence and perform QC
• Data analysis
© ESCMID eLibrary b
y author
Nucleic Acid Extraction
• Adapt protocol to tissue type, target organism, nucleic acid type, and sequencing type
• High molecular weight DNA is needed for long-read applications; enzymatic lysis
• High-energy lysis is best for microbes, but not for long-read sequencing
• Selective lysis to limit host DNA
• Automation is available
© ESCMID eLibrary by a
uthor
How much DNA/RNA do I need?
• Application dependent
• Short-read technologies need much less than long-read applications
• More nucleic acid provides a wider range of possible protocols
• Realistically: • Short-read Illumina sequencing: minimum DNA – 1 ng
• Long-read Nanopore and PacBio sequencing: minimum DNA – 250 ng to 1 microgram
• For RNA, single-cell RNAseq is possible (sub-ng amounts); but poly(A) vs rRNA depletion have different requirements
• Molecular weight critical for long-read applications
© ESCMID eLibrary b
y author
Library Preparation
• Preparing nucleic acids for sequencing is called library preparation.
• Sequencing reactions are initiated from identical regions of DNA
• Identical regions are artificially introduced using reverse transcription, PCR, ligation, or transposons.
© ESCMID eLibrary b
y author
• DNA is fragmented into small pieces (short-read only) • Multiple options available
• Fragments are enzymatically “cleaned” up
• Sequencing adapters are ligated to fragments • Adapters are custom sequences, unique to each sequencing platform
• Adapters serve multiple purposes • Identical initiation site for primer binding
• Aid in clonal amplification
• Adapters include a sample-specific sequence known as a barcode
• Adapters MAY also include a unique molecule identifier (UMI)
• Final size selection may be performed
• Library undergoes QC and quantification
How is NGS achieved? (DNA)
© ESCMID eLibrary by a
uthor
Shotgun sequencing approach for genome sequencing
5’
Typically, >10 kb genomic DNA fragments
5’
End Repair + A-tailing (sometimes)
A
A
5’
5’
A
A
5’
5’
A
A
5’
5’ A
A
5’
5’
A
A
5’
5’
A
A
5’
5’
A
A
5’
5’
Shearing (acoustic or enzymatic)
Sequencing Adapter Ligation
(NGS-platform-specific)
Adapter 2
BC
5’
5’ Adapter 1
BC Adapter 1
Adapter 2
Adapter 2
BC
5’
5’ Adapter 1
BC Adapter 1
Adapter 2
Adapter 2
BC
5’
5’ Adapter 1
BC Adapter 1
Adapter 2
Adapter 2
BC
5’
5’ Adapter 1
BC Adapter 1
Adapter 2
Adapter 2
BC
5’
5’ Adapter 1
BC Adapter 1
Adapter 2
Adapter 2
BC
5’
5’ Adapter 1
BC Adapter 1
Adapter 2 © ESCMID eLibrary by a
uthor
Shearing of gDNA
© ESCMID eLibrary by a
uthor
Images of nucleic acids Genomic DNA Extracts
© ESCMID eLibrary by a
uthor
Wo
rkfl
ow
Neiman et al. "Library preparation and multiplex capture for massive parallel sequencing applications made efficient
and easy." PLoS One 7.11 (2012): e48616.
T4 DNA polymerase
Taq DNA polymerase
T4 Polynucleotide kinase
T4 DNA ligase © ESCMID eLibrary b
y author
DNA Repair End repair is needed to prepare DNA for ligation by ensuring
that each molecule is free of overhangs, and contains 5′
phosphate and 3′ hydroxyl groups.
Step 1: Add 5 microliters of End Repair Mix to 10 microliters of sample. Mix, spin and place on ice. Step 2: Place tubes in thermocycler – 25˚C for 30 min; 70˚C for 10 min; hold at 10˚C Step 3: Spin tubes and place on ice
© ESCMID eLibrary by a
uthor
DNA Ligation Step 4: Add 6 microliters of sample-specific adapter mix to each well. Step 5: Add 9 microliters of ligase mastermix to each well. Mix and spin. Step 6: – 25˚C for 30 min; 70˚C for 10 min; hold at 10˚C Step 7: Spin tubes and place on ice © ESCMID eLibrary b
y author
PCR Amplification Step 8: Add 70 microliters of amplification mix to each well. Step 9: Perform PCR: 5-10 cycles of PCR depending on input DNA concentration.
© ESCMID eLibrary by a
uthor
Transposon-based DNA fragmentation
• Illumina Nextera • Illumina Nextera XT • Illumina Nextera FLX
“Transposases catalyze the random insertion of excised transposons into DNA targets with high efficiency.”
© ESCMID eLibrary by a
uthor
Se
qu
en
cer
Re
ad
y D
NA
© ESCMID eLibrary by a
uthor
• Quality analysis of samples and libraries: • Quantification of library – fluorimetry, qPCR
• Quality analysis – electrophoresis
Final Steps
© ESCMID eLibrary by a
uthor
Final Steps
• Size selection • Increase distance between paired-
end reads
• Select for shorter fragments to allow paired reads to overlap
• Remove unwanted fragments
• Decrease variability in size distribution between samples
• Fragment size, or distance between adapters, is known as the insert size.
© ESCMID eLibrary by a
uthor
Size Selection
© ESCMID eLibrary by a
uthor
Final Steps
• Combine samples into a final ‘Pool’
• Perform quantitative PCR to measure the number of prepared molecules with adapters
www.neb.com © ESCMID eLibrary by a
uthor
Sequencing Choices to Make
• How much data do I need? [Depth of sequencing] • Typically, 1 M to 20 M clusters of 2x150 (300 Mb to 6 Gb)
• What sequencing platform should I use?
• What read-length should I use?
• What insert size should I use?
• What kind of barcodes do I need?
• Should I use single-end or paired-end data (Illumina)?
© ESCMID eLibrary by a
uthor
Pa
ire
d-e
nd
se
qu
en
cin
g
© ESCMID eLibrary by a
uthor
Sequencing Run Quality Assessment (Illumina) • Run Metrics
• Number of clusters
• Clusters passing filter
• Total yield
• % of bases with >Q30
• Error rate
• % phiX detected
• % of clusters by sample
• Caveats • Not every library type is the same
• Metrics expected for one type of library may not be achievable for others
• PF >80%
• %Q30 >75%
© ESCMID eLibrary by a
uthor
Sequencing Run Quality Assessment (Illumina)
© ESCMID eLibrary by a
uthor
Sequencing Run Quality Assessment (Illumina)
© ESCMID eLibrary by a
uthor
Sequencing Run Quality Assessment (Illumina)
https://blog.horizondiscovery.com/diagnostics/the-5-ngs-qc-metrics-you-should-know
© ESCMID eLibrary by a
uthor
Sequencing Run Quality Assessment (Illumina)
https://blog.horizondiscovery.com/diagnostics/the-5-ngs-qc-metrics-you-should-know
© ESCMID eLibrary by a
uthor
• Many different protocols • Poly(A) Capture – eukaryotic organisms only • Ribosomal RNA removal – custom and pre-designed • Small RNA (e.g., miRNA) protocols
• More challenging than DNA-based protocols • Most protocols require conversion to ds-cDNA • RNA is more readily degraded • Microorganisms do not polyadenylate their mRNAs • Microorganisms do rapidly change their expression profiles • RNAseq can be confounded by residual gDNA • Ribosomes are the dominant RNA species • Microbial mRNA-seq in the presence of host RNA is tricky
How is NGS achieved? (RNA)
© ESCMID eLibrary by a
uthor
RNA Quality Analysis
© ESCMID eLibrary by a
uthor
RNA Quality Analysis
• RNA quality determines which RNAseq protocol can be used
• For microorganisms, poly(A) capture can never be used
• Thus, ribosomal RNA depletion must be performed (or excess sequence data generated)
https://www.mun.ca/biology/desmid/brian/BIOL2060/BIOL2060-22/CB22.html
© ESCMID eLibrary by a
uthor
Ribosomal RNA Depletion • Multiple techniques for removal
of ribosomes • Hybridization of DNA probes,
followed by RNAse H (shown right)
• Hybridization of biotinylated DNA probes, followed by streptavidin capture
• Hybridization of probes, followed by selective restriction digest
• Double depletion needed for samples with host RNA
www.neb.com
© ESCMID eLibrary by a
uthor
Sample mRNA workflow • RNA may or may not be
fragmented
• Reverse transcription with random primers – incorporate artificial sequence at 5’ end
• Molecular tricks to incorporate artificial sequence at 3’ end
• PCR amplify to incorporate sequencing adapters and barcode
© ESCMID eLibrary by a
uthor
Microbial RNAseq
• Depth of sequencing needed • Generally, 5-10 M clusters for
single organism after ribosomal RNA depletion
• 50 M clusters or more may be needed for complex microbial communities
• Paired-end sequencing not absolutely necessary due to low intronic content
Ofek-Lalzar, Maya, et al. "Niche and host-associated functional signatures of the root surface microbiome." Nature communications 5 (2014): 4950. © ESCMID eLibrary b
y author