Upload
yazid
View
46
Download
1
Embed Size (px)
DESCRIPTION
Targeted Sequencing of Human Genomes, Transcriptomes, and Methylomes. Jin Billy Li George Church Lab Harvard Medical School [email protected]. Genetic Loci X Sample Size = Information. PCR seq Mass-spec. SNP array. # samples. Shotgun seq RNA-seq ChIP-seq. # genetic loci. - PowerPoint PPT Presentation
Citation preview
Targeted Sequencing of Human Genomes, Transcriptomes, and
Methylomes
Jin Billy LiGeorge Church Lab
Harvard Medical [email protected]
Genetic Loci X Sample Size = Information
# sa
mpl
es
# genetic loci
PCR seqMass-spec
Shotgun seqRNA-seqChIP-seq
SNP array
Target Capturing with Padlock Probes (aka MIPs)
feature 1 feature n
pol
lig …
PCR (or RCA)
…
Porreca et al., Nat Methods 2007
Mass Production of Padlock Oligos
100 nt
150 nt
50 nt
55k features of up to 200nt
0.0
0.2
0.4
0.6
0.8
1.0
1.2
15
min
s
1 h
ou
r
1 d
ay
1 d
ay
+cy
clin
g
2 d
ays
5 d
ays
10
:1
50
:1
10
0:1
25
0:1 1x
10
x
10
0x
1,0
00
x
10
,00
0x
variable hyb time variable probe:gDNA variable dNTP amount
probe:gDNA = 10:1 2 day hyb time 1 day hyb time
100x dNTP 100x dNTP probe:gDNA = 100:1
Ca
ptu
rin
g e
ffic
ien
cy
(%
)
0
100
200
300
400
500
Fo
ld i
mp
rov
em
en
t
~10,000-fold Improvement Since Nov 2007
1. longer hybridization time; 2. more probes; 3. right [dNTP]
1 2 3
*
* 20-fold improvement already by better probe design and synthesis Li et al., in prepration
0.0
0.2
0.4
0.6
0.8
1.0
1.2
15
min
s
1 h
ou
r
1 d
ay
1 d
ay
+cy
clin
g
2 d
ays
5 d
ays
10
:1
50
:1
10
0:1
25
0:1 1x
10
x
10
0x
1,0
00
x
10
,00
0x
variable hyb time variable probe:gDNA variable dNTP amount
probe:gDNA = 10:1 2 day hyb time 1 day hyb time
100x dNTP 100x dNTP probe:gDNA = 100:1
Ca
ptu
rin
g e
ffic
ien
cy
(%
)
0
100
200
300
400
500
Fo
ld i
mp
rov
em
en
t
~10,000-fold Improvement Since Nov 2007
1. longer hybridization time; 2. more probes; 3. right [dNTP]
1 2 3
*
* 20-fold improvement already by better probe design and synthesis Li et al., in prepration
0.0
0.2
0.4
0.6
0.8
1.0
1.2
15
min
s
1 h
ou
r
1 d
ay
1 d
ay
+cy
clin
g
2 d
ays
5 d
ays
10
:1
50
:1
10
0:1
25
0:1 1x
10
x
10
0x
1,0
00
x
10
,00
0x
variable hyb time variable probe:gDNA variable dNTP amount
probe:gDNA = 10:1 2 day hyb time 1 day hyb time
100x dNTP 100x dNTP probe:gDNA = 100:1
Ca
ptu
rin
g e
ffic
ien
cy
(%
)
0
100
200
300
400
500
Fo
ld i
mp
rov
em
en
t
~10,000-fold Improvement Since Nov 2007
1. longer hybridization time; 2. more probes; 3. right [dNTP]
1 2 3
*
* 20-fold improvement already by better probe design and synthesis Li et al., in prepration
Improved Technology -> Better Performance
95% captured85% within 100-fold range55% within 10-fold range
Sensitivity + Uniformity Correlation
Nov 2007 Nov 2007
Current
Current
Li et al., in prepration
Summary of Improvements
Nov 2007 Current
Specificity ~100% ~100%
Sensitivity/Multiplexity (of 55k)
18% 95%
Uniformity (in 100-fold range)
16% 85%
Correlation of replicates (r)
0.35 0.98
Accuracy (heterozygous calls)
31% 99%
Targeted Capturing of
• Genomes– Exome: PGP etc.– Contiguous regions or gene panels– SNPs– Hypermutable CpG dinucleotides
• Transcriptomes– Alleotyping– RNA editing sites
• Methylomes– CpG methylation
Targeted Capturing of
• Genomes– Exome: PGP etc.– Contiguous regions or gene panels– SNPs– Hypermutable CpG dinucleotides
• Transcriptomes– Alleotyping– RNA editing sites
• Methylomes– CpG methylation
Predicting Putative Editing SitesA in the genome
G in some mRNAs or ESTs
A -> I (G) RNA Editing
• Post-transcriptional A -> I • I is read as G during translation• Only 10 targets are known in human coding
regions
36,000 predicted editing sitesgDNA + 7 tissue cDNAs from an individual
Padlock + Solexa: 239 sites found to be edited
Validation (PCR + Sanger):18 of 20 random sites are obviously
edited
Discovery of 100’s of Novel Editing Sites
with Erez Levanon, in preparation
Genomic DNA
RNA - intestine
RNA - kidney
RNA - diencephalon
RNA - frontal lobe
RNA - corpus callosum
RNA - cerebellum
RNA - adrenal
Example:
VEZF1
Bisulfite Padlock Probes (BSP): CpG Methylation
Bisulfite-treated genome
“3-base”genome
Highspecificityof padlock
Methylation Level Accurately Measured
r = 0.979-0.2
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Methylation level measured by BSP sequencingM
eth
yla
tio
n le
ve
l es
tim
ate
d b
y S
an
ge
r s
eq
ue
nc
ing
BSP-BSP correlation BSP-Sanger correlation
Methylation level measured by BSP sequencing
Met
hyla
tion
leve
l est
imat
ed b
y S
ange
r se
quen
cing
Methylation level, replicate 1
Met
hyla
tion
leve
l, re
plic
ate
2
r = 0.966
Methylation Pattern around GenesGene-Body Methylation
with Madeleine Price Ball, in preparation (poster)
George Church
Padlock technologyKun ZhangJohn AachAbraham RosenbaumJay ShendureGreg PorrecaAnnika Ahlford
RNA editingErez Levanon Jung-Ki Yoon
CpG methylationMadeleine Price Ball
Church Lab
Acknowledgements
AgilentEmily LeproustWilson Woo
SequencingYuan GaoBin XieBob Steen
Superior Quality of Padlock Oligos
100 nt
150 nt
50 nt
0
2
4
6
8
10
12
0 10 20 30 40 50
Number of reads
Pe
rce
nta
ge
of
sit
es
(%
)
before amplification (data)after amplication (data)before amplication (poisson)after amplification (poisson)
PCR (2x)
Solexa sequencing
55k features of up to 200nt
Fra
ctio
n o
f pr
obes
U
From Agilent Oligos to Padlock Probes
amplification and selection
T 18bp Agilent oligo, 136 bp 18bp
PCR
* p
exonuclease
USER + DpnII
DpnII
NN
UA
U
Annealed with DpnII guide oligo
Padlock probe
*
*
Heterozygous Genotypes Correctly Called
Homozygous wild typeHeterozygous variationHomozygous variation
before after
Methods in Comparison
Padlock Array-based hyb
Upfront probe cost
(10-20% of exome)$12,000 per 55k 100mers $600 per 385k 70mers
Probes amplifiable? Yes No
Reaction phase Solution, 10-20 μl Surface, 200 μl
Enzymatic hyb? Yes No
gDNA required ~0.5-1 μg 20 μg (WGA)
Efficiency (->accuracy) 1% N/A (<0.1%?)
Uniformity 100-fold range 10-fold range
Specificity ~100% on target 30-80% on or near target
125
160 159 162155146 139 142
181166
293
166165153
38
156
0
50
100
150
200
250
300
proximal distal proximal distal
extension arm ligation arm
Av
era
ge
co
ve
rag
e
A
C
G
T
Differential Clamping at Ligation Junction
% GC VS Capturing Efficiency
0
50
100
150
200
(10,
15]
(15,
20]
(20,
25]
(25,
30]
(30,
35]
(35,
40]
(40,
45]
(45,
50]
(50,
55]
(55,
60]
(60,
65]
(65,
70]
(70,
75]
(75,
80]
(80,
85]
(85,
90]
% GC
Ave
rag
e co
vera
ge
gap + arms
gap
extension arm
ligation arm
99% Concordance Between Padlock and HapMap
The Editing “Calls” Are Well Correlated
0.01
0.1
1
0.01 0.1 1
G/(A+G), frontal lobe replicate 1
G/(
A+
G),
fro
nta
l lo
be
re
plic
ate
2r = 0.964
Bisulfite-treated genome
• 10k CpG sites tiling the ENCODE regions – 1 CpG site every 3kb region on average
• High specificity– 79 of 80 Sanger reads match correct locations
Bisulfite Padlock Probes (BSP): CpG Methylation
B
strep
B
P
P
B
B
collected in a tube
PCR
λ exonuclease
shearing, end polishing
adapter ligation
hybridization in closed-tube solution
denaturing, PCR
Li et al., unpublished
Methods in Comparison
Padlock Array-based hyb Biotin-coupled hyb
Upfront probe cost
(10-20% of exome)$12,000 per 55k 100mers
$600 per 385k 70mers
$500 per 244k 60mers
Probes amplifiable? Yes No Yes
Reaction phase Solution, 10-20 μl Surface, 200 μl Solution, 10-20 μl
Enzymes in hyb? Yes No No
gDNA required ~0.5-1 μg 20 μg (WGA) ~0.5-1 μg
Efficiency (->accuracy) 1% N/A (<0.1%?) ~10%?
Uniformity 100-fold range 10-fold range 10-fold range?
Specificity ~100% on target30-80% on or near target
~55% on or near target
Two Tech Replicates Are Well Correlated
Ranked target sites
Num
ber
of r
eads
per
site
Counts, replicate 1C
ount
s, r
eplic
ate
2
Uniformity Correlation of counts