Upload
hoangkhuong
View
219
Download
1
Embed Size (px)
Citation preview
Next Generation SequencingMolecular Methods
Sylvain Foret
March 2010
http://dayhoff.anu.edu.au/~sf/next_gen_seq
The Genomic Age
Recent landmarks in genomics
1995 First bacterial genome (1.8 Mb)
1996 First eukaryotic genome (12 Mb)
1998 First animal genome (100 Mb)
2000 First human genome (3 Gb)
The Post-Genomic Age
Two big questions
How can we continue sequencing ever faster?
What can be done with all these sequences?
The Archon X Prize
To win the prize purse, the registered group must build adevice and use it to sequence 100 human genomes within10 days or less, with an accuracy of no more than oneerror in every 100,000 bases sequenced for no more than$10,000 per genome.
Other challenges and projects
The $1,000 human genome
The 1,000 genomes project (NHGRI, BGI, . . . )
Course Outline
Next generation (massively parallel) sequencing
Molecular methods (course 1)
Applications (course 2, course 3)
Sanger Method
Template
Synthesis
DNApolymerase
Primer
T
T
G
C
CAG
CGA AT CG
Electrophoresis
TT
A
A
C
CG
T
T A
T
Chromatogram
Quality score
Sanger Method Summary
Main characteristics
Sequencing by synthesis
Dye terminator method
Input Material: any DNA in sufficient quantity
PCR productsMolecular clones. . .
Error and Quality
Sources of error
Material (contamination, polymorphism, etc)
DNA polymerase
Signal (more prevalent at the end of the sequences)
Error and Quality
Quality scores
Each base sequenced is assigned a quality score Q
By definition: Q = −10 × log10(probability of error)
Thus: probability of error = 10−Q10
0 100 200 300 400 500 600 700 800 900
0
10
20
30
40
Position
Qualit
y s
core
Error and Quality
Consequences
Only relatively small sequences can be sequenced
Long sequences must be sequenced in several steps
For accuracy, each based should be covered more than once
Histogram of sizes, mean = 743.9 N50 = 762
Size
Den
sity
200 400 600 800 1000
0.00
00.
001
0.00
20.
003
0.00
40.
005
0.00
6
Shotgun sequencing
DNA fragmentation
Cloning into vectors
Grow vector in bacteria
Extract and sequence vectors
Map or assemble sequences
Vector
Primers
Insert
DNA extraction
Mate pairs
Illumina: Sample Preparation
Biological Sample
extraction
fragmentation,size selection
reverse transcription(random primers)
RNADNA
fragmentation,size selection
Illumina: Multiplexing
Multiplexing
Flow cell: 8 lanes
Each lane: up to 96 samples
Source: http://www.illumina.com
Illumina Summary
Main characteristics
Sequencing by synthesis
Reversible terminator method
Current size: 100bp
Pyrosequencing Chemistry
A+
Template+
Nucleotides
Extended template+
PPi (pyrophosphate)
PPi+
ADP phosphosulfate+
sulfurylate
ATP
ATP+
Luciferin+
Luciferase
Oxyluciferin
454: Mate Pairs
Internal adapter Internal adapterInsert (3kb−20kb)
Circularize
Cut150−180 bp 150−180 bp
Sequence
Add sequencing adapters
454: Multiplexing
Multiplexing
Each plate: 1, 2, 4, 8 or 16 regions separated by gaskets
Each region: up to 12 samples
TemplateAdapter
TemplateAdapter + MID(Multiplex Identifier)
Source: http://www.454.com
SOLiD: Mate Pairs
Internal adapter Internal adapterInsert (600bp−10kb)
Circularize
Cut
Add sequencing adapters
Sequence
SOLiD: Multiplexing
Multiplexing
Each run: 2 slides
Each slides: 1, 2, 4, 8 regions
Each region: up to 16 samples
Source: http://solid.appliedbiosystems.com
Summary
Numbers, as of March 2010
454 Illumina SOLiD
(Titanium) (Genome Analyser IIx ) (SOLiD 3)
Mean read size 400bp 100bp 50 bp
Reads per run 106 200 × 106 500 × 106
Run time 10 hours 4 days 1 week
Insert size 3kb–20kb 200bp–5kb 600bp–10kb