Upload
antony-james
View
215
Download
0
Tags:
Embed Size (px)
Citation preview
Gene Synthesis using DNAWorks
Dr. David Hoover
Helix Systems, SCB, CIT, NIH
Gene Synthesis
Several methods• ligation - incredibly tedious and inefficient• FokI - sequence dependent (type IIs r.e.)• serial cloning - sequence dependent• assembly or self-priming PCR
Gene Synthesis Methods
Thermodynamically Balanced Conventional
Thermodynamically Balanced Inside-Out
Protein Expression
Protein/Structure Independent Factors:• promoters and upstream elements• translational initiation and termination• mRNA stability• codon bias
Protein/Structure Dependent Factors:• folding and aggregation• proteolysis and degradation• secretion and localization
Codon Bias
0.0%
10.0%
20.0%
30.0%
40.0%
50.0%
60.0%
70.0%
80.0%
90.0%
R:A
GG
R:A
GA
I:A
TA
G:G
GA
P:C
CC
R:C
GA
L:C
TA
R:C
GG
T:A
CA
L:T
TA
S:A
GT
S:T
CA
L:C
TG
S:T
CG
R:C
GC
R:C
GT
A:G
CG
P:C
CG
E. coli
H. sapiens
Synthetic Genes
Benefits:• Codon use optimized for host• Flexibility in subcloning• Ease of complex mutagenesis
Problems:• Time consuming• Complicated• Error-prone
Commercial Sources
Blue Heron Biotechnology (http://www.blueheronbio.com)
DNA 2.0 (http://www.dna20.com/)
Gene Script Corporation (http://www.genscript.com/)
BioNexus Inc. (http://www.genesynthesis.net/)
Entelechon (http://www.entelechon.com/)
GeneArt (http://www.geneart.com/)
Codon Devices (http://www.codondevices.com/)
Commerical Sources
Typical costs:• $0.79 - $3.60 / bp• Complexities?• Intellectual property?
• 800 bp = $1000 (Gene Script)
Genes From Scratch
• oligos ~ $0.20 / nt (NIH discount)• PCR reagents ~ $2 / reaction • sequencing ~ $20 / 600 bp• electrophoresis ~ $3 / gel• labor ~ $20 / hr
GFP, 238 aa, 714 bp, 20 oligos, 1134 nt, 2 reactions, 2 gels, 4 sequences, ~10 hrs = $517
How to design oligosreverse-translate protein into DNA, optimum codon
usage
break into fragments of equal overlap Tm
optimize:• hairpins / mRNA structure• repeats / mispriming• restriction site inclusion / exclusion• length
DNAWorks
http://helixweb.nih.gov/dnaworks/
DNAWorks Output
181 TCTGGTGAAGGCGAGGGTGACGCGACCTACGGTAAACTCACTCTCAAAT agaccact TGCCATTTGAGTGAGAGTTTAAGTAGACGTGG <--- 4 S G E G E G D A T Y G K L T L K F I C T
| | | | | | |
7 ---> 241 ggttccttggccgaccctggttactaccttctcttacggtgttcag TGCCCGTTTGACGGCCAAGGAACCGGCTGG tc <--- 6 T G K L P V P W P T L V T T F S Y G V Q
| | | | | | |
DNAWorks Options
• Job Name• E-mail Address
DNAWorks Options
Codon Frequency Table• E. coli (standard, class II), H. sapiens, C.
elegans, D. melanogaster, M. musculus, P. pastoris, R. norvegicus, S. cerevesiae, X. laevis
• Custom CFT
Gly GGG 599428.00 16.49 0.25 Gly GGA 597986.00 16.45 0.25 Gly GGT 392298.00 10.79 0.16 Gly GGC 814464.00 22.41 0.34 Glu GAG 1441162.00 39.65 0.58 Glu GAA 1043166.00 28.70 0.42 Asp GAT 789799.00 21.73 0.46 Asp GAC 914677.00 25.16 0.54 Val GTG 1028789.00 28.30 0.46 Val GTA 257442.00 7.08 0.12 Val GTT 399567.00 10.99 0.18 Val GTC 528840.00 14.55 0.24 Ala GCG 271820.00 7.48 0.11 Ala GCA 579156.00 15.93 0.23 Ala GCT 672416.00 18.50 0.26 Ala GCC 1018345.00 28.02 0.40 Arg AGG 432954.00 11.91 0.21 Arg AGA 434655.00 11.96 0.21 Ser AGT 441137.00 12.14 0.15 Ser AGC 706723.00 19.44 0.24 Lys AAG 1163126.00 32.00 0.57 Lys AAA 879684.00 24.20 0.43
DNAWorks Options
Parameters• Annealing Temperature• Oligo Length (random)• Codon Frequency Threshold (random, strict,
scored)• Oligonucleotide, Na+/K+, Mg2+ Concentrations• Number of Solutions• TBIO• No gaps in assembly
DNAWorks Options
Balancing act• Fast, simple, cheap?• Slow, complex, expensive? - reliable• Reusable and interchangeable oligos?
DNAWorks Options
Others• Restriction Site Screen (non-degenerate,
degenerate sequences)• Custom Site Screen (mind the format!)• Weights (experimental)
DNAWorks Options
Sequences• protein (X = stop)• nucleotide (can be degenerate)• almost any file format• reverse sequence• fix sequence in gap
DNAWorks Output
Web output• Input for DNAWorks (standalone version)• Header• Initial parameters• Optimization log• Final scores• Final summary
DNAWorks Output
Total output• Sequence blocks• CFT blocks• Pattern block• Trials• Final Summary
DNAWorks Output
Trial outputs• Initial parameters• Final DNA sequence• Assembly• Final scores• Codon report• Histograms• Oligo sequences
Scores / Penalties
• codon usage• length• melting temperature• repeat• pattern• mispriming• AT/GC contents• gapfix
Mutant Run
• Design oligos based on previous set of oligos• Parameters taken from previous run• For single mutation, will output 1 or 2 oligos only
What to look for
Final Summary• Avoid misprimes and repeats• Make sure overlaps are > 12 nt (Short)• Tm range should not be > 3°C (TmRange)
Don't depend entirely on scores• Arbitrary, somewhat dependent on length
Tricks
Choosing codons• random - slower optimization, less constrained• strict - for the fussy• scored - if codon score really matters
Tm, Length ranges, Number of Solutions• To find the "very best" solution• no more than 999
Tricks
Design multi-use and interchangeable oligos• Flanking primers with standard overlaps• Intersperse nucleotide elements between protein
elements• Gap-fix restriction sites• Allow for mutations later on
Random mutagenesis• Nucleotide sequences can be degenerate
Tricks
Thermodynamically Balanced Inside-Out Mode• Multi-step PCR• More controlled, reliable method• Gao X., et al., Nucleic Acids Res 2003
Random oligo lengths• Faster, better optimization• For the not-so-fussy• Probably best for DNA-only genes
Tricks
Set Tm higher• 64°C - 70°C• longer oligos, extra purification ($$$)
Always double check!
Nothing is foolproof• Think carefully about what you need BEFORE
starting work• Always run final sequences through alternate
program (EMBOSS, GCG-Lite)• Make sure oligos are what you intended
PCR
• Mix all oligos and additives• Specific PCR protocols• Analytical gel• Isolate desired products
Assembly ProtocolOligos 1 μl 625 nM each 25 nM each
dNTPs 2 μl 2.5 mM each 0.25 mM each
H2O 19 μl
Buffer 2.5 μl 10X 1X
Pfu pol. 0.5 μl
95°C 2.0 ' 1X
95°C 0.5 '
65>55°(-0.5) 0.5 ' 20X
72°C 0.5 '
72°C 5 ' 1X
4°C hold
Amplification ProtocolPCR mix 2 μl ? ?
dNTPs 8 μl 2.5 mM each 0.2 mM each
3' primer 4 μl 10 μM 400 nM
5' primer 4 μl 10 μM 400 nM
Buffer 10 μl 10X 1X
H2O 70 μl
Pfu pol. 2 μl
95°C 2.0 ' 1X
95°C 0.5 '
62°C 0.5 ' 20X
72°C 0.5 '
72°C 5 ' 1X
4°C hold
Problems
• No product (complete failure)• Wrong size product (mispriming)• Mutations (2 out of 3 correct, 2 errors/kb)
Sequencing is warranted...
Fixes
• Optimize PCR conditions• Break gene synthesis into steps (TBIO)
Errorsp = mutation rate / 1000 nt / duplication (Cline et al., Nucleic Acids Res 24 (1996))
Taq polymerase = 0.008KOD (Novagen) = 0.0027PfuUltra (Stratagene) = 0.00043
The probability of a gene n bp in length having no errors using a polymerase with mutation rate p:
p' = (1 - p)n
Therefore, p' for a 738 bp gene = (1 - 0.00043)738 = 0.728
ErrorsThe number of clones needed to screen to find a correct gene with 95% confidence:
N = log(0.05)/log(1-p')
Thus, log(0.05)/log(1-0.728) = 3 clones need to be sequenced.
From Wu et al., J Biotech 124 (2006)
Time
• Find protein of interest, design oligos, order oligos
• Run PCR, integrate into sequencing vector, transform
• Pick colony, grow overnight culture
• Miniprep construct, integrate into expression vector, transform
• Pick colony, grow overnight culture
• Run expression growth trials
~ 1 week between concept and initial trial (at best!!)
Can be automated and parallelized (96 well plates?)