Molecular Genomic Imaging Center (CEGS) Harvard / Wash U George Church, Rob Mitra Greg Porreca, Jay...

Preview:

Citation preview

Molecular Genomic Imaging Center (CEGS)Harvard / Wash U

George Church, Rob MitraGreg Porreca, Jay Shendure

Sequencing by Ligation on Polony Beads

with Nick Reppas, Kun Zhang, Shawn Douglas, Mike Wang,

Abraham Rosenbaum, Agencourt

Personal Genomics, Stem Cells, ELSI

Synthetic Biology

Polymerase colony

2 vs. 1 immobilized primerin situ polonies vs. emulsion PCR beadssingle molecule vs. multi-molecule detectiondNTP extension (SBE) vs. ligation (SBL) (>=3X error 1e-6, 1/10 cost of ABI E.coli )

Shendure, Porreca, Mitra, Church

• Single chromosomes : haplotyping (Zhang)• Single cells : full sequence (Zhang & Martiny)• Single RNA molecules : RNA splicing (Zhu, Varma)

1. In vitro construction of a complex mate-paired library

2. Template amplification to one micron beads by emulsion PCR

3. Cyclic Array Sequencing by Ligation (SBL)

Polony Sequencing Overview

~1 kb genomic fragment

paired genomic tags

(17 to 18 bp each)

common sequences

MmeI

Fisseq-F Fisseq-RT30 Tag 2Tag 1Fisseq-FLeft RightMid Seq2Seq1

In vitro construction of a complex, mate-paired library

43 bp 32 25

Total = 134-136 bp amplicon

(1) Emulsion PCR

to 1 micron beads

Dressman et al. PNAS'03

Template Amplification

Enrichment by Hybridization

Selector Bead

One of 750 megapixel frames of gel-immobilized 1.0 micron beads, 0.3 micron pixels, 4-colors

ACUCAUC…(3’)…TAGAGT????????????????TGAGTAG…(5’)

5’-Cy5-nnnnAnnnn-3’ 5’-Cy3-nnnnGnnnn-3’ 5’-TR-nnnnCnnnn-3’ 5’-Cy3+Cy5-nnnnTnnnn-3’

5'PO4

Sequencing by Ligation (SBL) with fluorescent combinatorial 9-mers

Excitation Emission 647 700 555 605 572 630 555 700

nm

Consensus Accuracy False Positives (E.coli) False Positives (Human)1E-3 4,000 3,000,000

1E-4 BERMUDA/ABI 400 300,000

1E-6 Polony-SBL 4 3000

Goal of Resequencing Discovery of Uncommon Variation

Why low error rates?

trp/tyrA pair of genomes shows the best co-growth

(syntrophs) Reppas & Lin

First Passage SecondPassage

Genome engineering:Select for cross-feeding

0

1

2

3

4

5

6

7

8

0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150

# of passages

Do

ub

lin

g t

ime

(h

r)

Q1

Q3

Q2-1

Q2-2

EcNR1

Co-evolution of cross-feedingTrp- & Tyr- genome pair

~1 kb genomic fragment

980 ± 96 bp

~860,000 independent mate-pairing events

0 1000 2000 3000 4000 5000 6000 70000

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

confirmed 776 bp deletionvia tandem 8 bp repeats

1,974,001(MG1655)

1,978,000(MG1655)

Aberrations in mate-pair distance indicative of rearrangements

Base-calling Tetrahedron

Fluorescent SBL data quality measured by distance to the 4 vertices.

A

G

C

T

Mean accuracy = 99.5%

Best 50% of base-calls are 99.9% accurate

1E-04

1E-03

1E-02

Cumulative Fraction of Data

Q40

Q30

Q20

Raw Error Rate

Mutation

Coverage % genome Error Mock SNPs Candidates

>=3x 68.10% < 1e- 6 60/ 60 correct n = 7

2x 13.40% 0.0002 15/ 15 correct n = 108

1x 11.20% 0.007 11/ 14 correct n = 3344

1E-71E-61E-51E-41E-31E-21E-1

Coverage

Consensus error rates

Position Type Gene LocationABI

Confirmation

Comments

986,334 T > G ompF TATA box Only in evolved strain

931,960 8 bp del lrp frameshift Only in evolved strain

1,976,500 776 bp del insB_5 IS element MG1655 heterogeneity

3,957,960 C > T ppiC 5' UTR MG1655 heterogeneity

4,654,533 T > C cI Glu > Glu heterogeneity

4,647,960 T > C ORF61 Lys > Gly heterogeneity

985,797 T > G ompF Glu > Ala (in progress)

454,864 T > C tig Gly > Gly(in progress)

4,648,691 G > A exo Phe > Phe(in progress)

Mutation Discovery in Engineered & Evolved Trp-Strain

ABI 2004 Jun 2005 2006 >2007

# bp/expt - 2e7 3e7 3e8 60e9

Complexity (bp) - 74 4e6 3e9 6e9

Avg Fold Cov 8 3e5 6 0.1 10

Pix per bp - 300 1724 333 1Read-length 900 14 (SBE) 25 (pair) 35 42

$ / Q20 kb 8e-1 - 8e-2 4e-2 1e-5

$/ 1X 3e9 b 2e6 - 2e5 5e4 1e2 (2e3)

Indel Error 5e-3 0.6% 1e-3 1e-3 1e-3

Subst Error 4e-3 4e-6 1e-3 1e-3 1e-3

3X Cons Err 1e-4 - 1e-6 3e-7 1e-7

Kb / min 0.8 360 27 1e3 1e6

Pix / sec - 2e5 2e6 6e6 2e7Enz $/mg - 8 8 8 0.4

Cost comparison & projection

>2007

# bp/expt 60e9 20X of 3e9 = 10X diploid

Complexity (bp) 6e9 Automated 96-well libraries

Avg Fold Cov 10 (Currently align .4 pix = .1 micron)

Pix per bp 1 Sensitivity & align CCD & slide?

Read-length 42 Is 34 enough? (next slide)

$ / Q20 kb 1e-5 (20X 3e9)

$/ 1X 3e9 b 1e2 (2e3) Need haplotyping too? (slide after next)

Indel Error 1e-3

Subst Error 1e-3

3X Cons Err 1e-7

Kb / min 1e6

Pix / sec 2e7 Current camera is 3e7, but stage is 2e6

Enz $/mg 0.4 Realized for many recombinant proteins

Challenges in $2000 genome

Assume paired 17-mers (i.e. read full tag length) with 750-1150 bp distance distribution (980 ± =96 bp observed)

Exact Matching (34/34) Zero UniqueZero Unique MultipleMultiple

Paired, no substitutions ---- 94.4 5.6Paired, one substitution 98.3 0.5 1.3Unpaired, no substitutions 98.8 0.3 0.9

Single Substitution or Exact (33/34 or 34/34) Zero Unique Multiple

Paired, no substitutions ---- 90.4 9.7Paired, one substitution ---- 92.8 7.2Unpaired, no substitutions 96.0 1.5 2.5

Human Resequencing with Mate-Paired 17 bp Tags [simulation]

rs3778973

rs1557917

rs39284

rs10500042

rs4717028

GM10835

C

G

C

G

C

T

A

T

A

T

TT=137 CT=2 (TC=1) CC=131

153Mb

Single chromosome molecule haplotypes

Amplifying & sequencing whole genomes from

single cells

29 real-time amplification

No template control

Affymetrix quantitation of 2 independent amplifications

Escherchia & ProchlorococcusZhang, Martiny,

Chisholm, Church, unpub.

Polymerase colony

2 vs. 1 immobilized primerin situ polonies vs. emulsion PCR beadssingle molecule vs. multi-molecule detectiondNTP extension (SBE) vs. ligation (SBL) (>=3X error 1e-6, 1/10 cost of ABI E.coli )

Shendure, Porreca, Mitra, Church

• Single chromosomes : haplotyping (Zhang)• Single cells : full sequence (Zhang & Martiny)• Single RNA molecules : RNA splicing (Zhu, Varma)

Shared Resources STTR Polymerase libraries NEB MJR ABI Fuller CCDs spectra, cost, #pixels, sensistivity, speed software

Cancer Genome 12500 NCAB clonal? enrichment MRD accuracy read length

Cost estimates distribute template spreadsheet

Roundtable I

Recommended