Upload
irene-miles
View
43
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Enzyme Engineering Research & Technology Development. Enzymatic Catalysis Group, PMC Advanced Technology. Overview of objectives. Quantitative understanding of enzyme evolution (academic publications) explaining origin of natural active site sequence distributions - PowerPoint PPT Presentation
Citation preview
Enzymatic Catalysis Group, PMC Advanced Technology
Enzyme Engineering Research & Technology Development
Overview of objectives
Quantitative understanding of enzyme evolution (academic publications)
• explaining origin of natural active site sequence distributions (benchmarking on MSA and pdb data)
Redesign enzyme active sites (designer enzyme products) • modify substrate selectivity, product inhibition, etc• for industrial biocatalysis, biotechnology and biotherapeutics (with experiment)
To advance the state-of-the-art in enzyme design technology (design software) • through the application of high-resolution physics-based methods for active site modeling using:
1) High-res protein structure prediction (OPLS + SGB): loop prediction for reshaping active sites, side chain optimization
2) Semiempirical enzyme-substrate binding affinity scoring (Km), substrate pose sampling
3) Refinement based on details of electronic structure: scoring activation energies (kcat)
Quantum chemical
sequence optimization
Ab initio loop
prediction
Experimental
sampling
Classical sequence
optimization
Core design
Schematic of computational enzyme design technology
Software Patents
Design ProtocolPatents
Input information
Target chemical
Desired raw material
Existing synthetic pathways
Existing biocatalysts
Zymzyne™ Computational Design Process
System Output
~1000 potential candidatesexpected catalytic activity
Zymzyne™ Experimental Optimization
Optimized Biocatalyst
Design Computationally Refine Experimentally
1030 candidates screened 500 candidates screened
Zymzyne Enzyme Design and Optimization Platform
Software Patents
Design ProtocolPatents
A model fitness measure for enzyme sequence optimizationA model fitness measure for enzyme sequence optimization
• Maximize free energy of substrate binding over sequence space
Represent catalysis through constraints on interatomic distances of catalytic side chains
• Minimize total energy of complex for any sequence
• To start, omit selection pressure for product release
• Maximize free energy of substrate binding over sequence space
Represent catalysis through constraints on interatomic distances of catalytic side chains
• Minimize total energy of complex for any sequence
• To start, omit selection pressure for product release
substrate binding catalysis product release
Active site sequence optimization requires accurate energy functions, solvation models, and search algorithms
10o resolution rotamer library (297 proteins)
Xiang, Z. and Honig, B. (2001) J. Mol. Biol. 311: 421-430.
Active site sequence optimization requires accurate energy functions, solvation models, and search algorithms
10o resolution rotamer library (297 proteins)
Ghosh, A., Rapp, C.S. & Friesner, R.A. (1998) J. Phys Chem. B 102, 10983-10990.
Xiang, Z. and Honig, B. (2001) J. Mol. Biol. 311: 421-430.
S-GB continuum solvation
Active site sequence optimization requires accurate energy functions, solvation models, and search algorithms
10o resolution rotamer library (297 proteins)
Ghosh, A., Rapp, C.S. & Friesner, R.A. (1998) J. Phys Chem. B 102, 10983-10990.
Xiang, Z. and Honig, B. (2001) J. Mol. Biol. 311: 421-430.
Friesner, R.A, Banks, J.L., Murphy, R.B., Halgren, T.A. et al. (2004) J. Med. Chem. 47, 1739-1749.Jacobson, M.P., Kaminski, G.A. Rapp, C.S. & Friesner, R.A. (2002) J. Phys. Chem. B 106, 11673-11680.
S-GB continuum solvation
OPLS-AA molecular mechanics force field + Glidescore semiempirical binding affinity scoring function
φ,ψ = the backbone torsion angles
Backbone = the sequence of (COOH)-[N-(CH-Ri)-(C=O)]
N-NH
2 , where R
i is the i'th side
chain.
2N torsion angles specify the backbone configuration.
Side-chains have their own rotamers too!These angles are represented by χ
i.
Some side chains have no χ angles.Some have quite a few, such as the lysine above with χ
1-χ
4.
Streptavidin Native –10.04 kcal/mol
Computational sequence optimization correctly predicts most residues in ligand-binding sites…
Chakrabarti, R., Klibanov, A.M. and Friesner, R.A. Computational prediction of native protein ligand-binding and enzyme active site sequences. PNAS, 2005.
Streptavidin Native –10.04 kcal/mol
Computational sequence optimization correctly predicts most residues in ligand-binding sites…
Chakrabarti, R., Klibanov, A.M. and Friesner, R.A. Computational prediction of native protein ligand-binding and enzyme active site sequences. PNAS, 2005.
Streptavidin Native –10.04 kcal/mol
Computational sequence optimization correctly predicts most residues in ligand-binding sites…
9 / 10 residues predicted correctly in top 0.5 kcal/mol of sequences
Chakrabarti, R., Klibanov, A.M. and Friesner, R.A. Computational prediction of native protein ligand-binding and enzyme active site sequences. PNAS, 2005.
Easy to exptly screen libraries of this size
CO2- is covalent attachment site
for biomolecules
R61 DD-peptidase Native –10.02 kcal/mol
…and enzyme active sites
R61 DD-peptidase Native –10.02 kcal/mol
High MSA variability
…and enzyme active sites
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
D A F R S Q E Y H I L K N G T W V M
T123 highly degenerate in multiple sequence alignment
-galactosidase Native –9.13 kcal/mol
0
0.2
0.4
0.6
0.8
1
D A F R S Q E Y H I L K N G T W V M
• Native amino acid is generally one of top 3 most frequently predicted
• Could be used to focus combinatorial libraries (3N vs 20N, N = # of residues)
Computed
Computational enzyme sequence optimization: sugar catalysis
Glucose-binding protein Native –8.81 kcal/mol
Computed amino acid distributions contain detailed evolutionary information
Glucose-binding protein Native –8.81 kcal/mol
0
0.1
0.2
0.3
0.4
0.5
0.6
D A F R S Q E Y H I L K N G T W V M
Fre
qu
ency
Observed (sequence alignment)
Computed amino acid distributions contain detailed evolutionary information
Glucose-binding protein Native –8.81 kcal/mol
0
0.1
0.2
0.3
0.4
0.5
0.6
D A F R S Q E Y H I L K N G T W V M
Fre
qu
en
cy
0
0.1
0.2
0.3
0.4
0.5
0.6
D A F R S Q E Y H I L K N G T WV M
Fre
quency
Computed
Observed (sequence alignment)
Computed amino acid distributions contain detailed evolutionary information
• Computed residue frequencies often mirror natural frequencies
OH
OH
Anomeric promiscuityEpimeric promiscuity
R61 DD-peptidase 0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
D A F R S Q E Y H I L K N G T W V M
Summary of recent results: classical sequence optimization (Side chain prediction/ Binding affinity calculation / Sequence opt)
T123 highly degenerate in multiple sequence alignment
Nucleophile Ser62
Acid/baseY159
Electrostatic stabilizerLys65
0
0.2
0.4
0.6
0.8
1
1.2
Phe120 Asn161 Trp233 Arg285 Thr299 Ser326 Ser62 Lys65 Tyr159
Rm
sd to
nat
ive
(A)
Glucose-binding protein Native –8.81 kcal/mol
0
0.1
0.2
0.3
0.4
0.5
0.6
D A F R S Q E Y H I L K N G T W V M
Fre
qu
ency
0
0.1
0.2
0.3
0.4
0.5
0.6
D A F R S Q E Y H I L K N G T WV M
Fre
quency
Computed
Observed (sequence alignment)
Computed amino acid distributions contain detailed evolutionary information
• Computed residue frequencies often mirror natural frequencies
OH
OH
Anomeric promiscuityEpimeric promiscuity
High-resolution sequence optimization is robust across diverse functional families
Peptide
Nucleotide
Sugar
Active Site Design of Enzymes with Nucleotide Substrates: Cytidine Kinase
Multisubstrate enzyme active site sequences represent superpositions of computational predictions
dTMP
HSV-1 thymidine kinase
Multisubstrate enzyme active site sequences represent superpositions of computational predictions
dTMP
Ganciclovir (dG analog)
Multisubstrate enzyme active site sequences represent superpositions of computational predictions
dTMP
Ganciclovir (dG analog)
Thymidine
Apply multiobjectivesequence search algorithms to accommodateseveral substrates
Native sequence =superposition of optimal sequences for multiple
substrates
Multisubstrate enzyme active site sequences represent superpositions of computational predictions
dTMP
Ganciclovir (dG analog)
Thymidine
Catalytic hydrogen-bonding networks can be incorporated into sequence optimization
GLU 272
TYR 150
ASN 152GLN 120
ARG 148
GLU 272
LYS 315
Cephalothin
LYS 67
W402
a
b c
dSER 62
e
f
g
h
-Lactamase : cephalothin
TYR 150
ASN 152GLN 120
ARG 148
GLU 272
Cephalothin
LYS 67
W402
a
b c
dSER 62
e
f
g
h
Chakrabarti, R., Klibanov, A.M. and Friesner, R.A. Sequence optimization and designability of enzyme active sites. PNAS, 2005.
0
0.5
1
1.5
2
2.5
119 120 152 221 293 316 318 346S
ite
entr
op
y
Constrained
Constrained + Filtered
+1 kcal/mol
+2 kcal/mol
LYS 315
Catalytic hydrogen-bonding networks can be incorporated into sequence optimization
Refining the scoring function: quantum chemical transition state calculations
Enzyme kcat (s-1) KM (μM) kcat/KM (% Wild-type)
WT 150 14 100
N152S 3 7 4.3
N152D 0.12 24 0.05
N152S/Q120F 3 4.6 6.7
N152S/Q120H 20 11.4 16.3
Predicted 14.3 kcal/molMeasured 14.3 kcal/mol
Number of residues correctly predicted
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0 1 2 3 4 5 6 7
Fra
ctio
n o
f to
tal
seq
uen
ces
0 1 2 3 4 5 6 7 8 9 10
Active Site Designability: The Number of Sequences that Solve a Given Design Problem
Catalytic Nucleophile Ser62
General acid/baseY159 Electrostatic stabilizer
Lys65
Catalytic nucleophileGlu-299
General acid/baseGlu-200
DD-peptidase -gal
+ 1 kcal/mol
+2 kcal/mol
Constrained
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
D A F R S N E Y H I L K N G T W V
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
D A F R S N E Y H I L K N G T W V C
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
D A F R S N E Y H I L K N G T W V C
Patents: computational sequence optimization / experimental mutagenesis
New enzymes - Improved catalytic turnover Altered substrate selectivity
New enzymes - Improved catalytic turnover Altered substrate selectivity
3 permissible mutations identified by modeling at a target position
3 permissible mutations identified by modeling at a target position
43 mutation combinations = 64 sequence variations
43 mutation combinations = 64 sequence variations
Example of screening focused library of sequence variants
Example of screening focused library of sequence variants
3 positions subject to mutagenesis3 positions subject to mutagenesis
Synthetic gene assembly and variant library construction via DNA synthesis Synthetic gene assembly and variant
library construction via DNA synthesis
Biological selection of variant library Biological selection of variant library
Patents: algorithms in development
Protein structure Substrate binding Reactive chemistry
Active site reshaping
Loop Sidechain Glidescore Pose sampling
ClassicalSequence Optimization(fixed ligand)
ClassicalSequence Optimization(free ligand)
Calculatingmutant enzyme reaction rates
• for QM/MM refinement of enzyme design• speeding up mutant TS searches
New algorithms for side chain optimization
• scores desired loop against other low-energy excitations
QM sequence refinement
• Hierarchical pose screening• Locates global seq/struct optima for a given active site/ligand comb • Estimates “designability” of active site (fixed backbone)
Testing: Current experimental projects
•Dehalogenase-dehydrogenase redesign: arbitrary backbone reshaping to
accommodate NAD – being tested now, could also benefit from faster side chain opt
Art testing induced fit results on this system without sequence opt
• PBP/ -lactamase redesign (needed – covalent docking in Glide [XP grow] or Macromodel-prime):
helix breaking (now being done)/backbone reshaping + redocking + QM/MM sequence refinement
• Single mutation activation barrier predictions in -lactamase – currently being tested
Exptl project Methods applied Notes
Sirtuin redesign for enhanced
activity
1) Active site backbone
reshaping, multiobjective
genetic and monte carlo
sequence search
2) Selection via in vivo
complementation
3) In vitro kinetics of engineered enzymes
1) Accommodate NAD+
2) Reduce binding affinity
to NAM (reaction product) to
reduce product inhibition
Mutant activation barrier predictions
in PBP -lactamase
1) Side chain structure prediction + QM/MM activation barrier calculation
2) In vitro kinetics of
mutants: compare Kd and kcat to computed values
1) To establish foundation for computational refinement of activity
2) Basis for future work on rapid algorithms for QM refinement of enzyme design
Discussion Points
• NEB combinatorial screening protocols
• NEB DNA enzyme engineering challenge problems
• Scope for interaction:
– Technology Platform to be used by both parties?
– Engineered DNA Enzyme Products? Cosolvent-resistant polymerases?
– IP: Software, Designability-Based Screening Protocols (compare Maxygen, Diversa), and Engineered Enzymes
Sirtuin – mutant production, selection, protein expression and enzyme assay
Main objective - Develop genetic and biochemical assay systems to screen sirtuin mutant library and quantify enzymatic activities.
Main steps -
Model mutations in the active site residues of bacterial sir2Tm. Generate a set of mutations using wild-type sirtuin as template based on computation-guided structural modeling.Transform the mutants into host strains with sirtuin deletion. Assay growth of mutant transformants under carbon source limitation. Select mutant constructs which can complement the growth defects resulted from sirtuin deficiency, which are manifested under carbon limitation. Purify the wild-type and active mutant enzymes and quantify their kinetic properties.
Sirtuin – mutant generation
Model mutations in the active sites of sirtuin genes by computational analysis.
Construct mutations in the wild-type sir2Tm plasmid (2 potential methods) By synthetic gene method – Generate sequence map for proposed nucleotide changes in the wild-type template (sir2Tm) . Work with gene synthesis groups to make synthetic constructs for the mutant collection, e.g. how to get efficient oligo assembly to cover all the mutations. Obtain suitable plasmid vectors and clone the mutant constructs into the vectors. The vectors would depend on the host cells in which the mutant constructs would be expressed and selected, e.g. yeast, salmonella have different vectors to allow high-level expression.
By multi-site directed mutagenesis method – Use reagents including cells, enzymes and mutagenic primers to generate mutation in the wild-type sirTm template. Verify mutations by DNA sequencing.
Both procedures for mutagenesis depend on the actual mutations to be made and how many constructs are needed to allow for effective functional screening.
Sirtuin – mutant library screening assay
When the mutant collection is generated, transform the constructs into host cell with sirtuin deletion. Make competent cells for the host strain so they can take up DNA. Transform the wild-type plasmid into host as positive control. Transform the mutant plasmids into host cells.Assay whether the transformants could grow on carbon-limited media, such as with acetate or propionate as sources. If there is complementation, characterize the growth features of these cells. Verify the specific mutations by DNA sequencing. Transform the mutant construct into protein-expression host, such as Ecoli BL21. Grow cultures and purify sufficient quantities of proteins. Set up enzymatic assays to quantify kinetic properties of wild-type and selected mutants.
Beta-lactamase – mutant selection, protein expression and activity assay
Model mutations in the active site residues of P99 beta-lactamase.
Construct mutations in the wild-type P99 beta-lactamase gene.
Obtain bacterial host strains suitable for screening beta-lactam antibiotic resistance.
Transform bacteria host cells with wild-type and mutant constructs.
Select transformed cells in the presence of beta-lactam antibiotics.
Identify the mutant clones which can grow in beta-lactam and thus retain beta-lactamase activities.
Express and purify the wild-type and mutant beta-lactamases and quantify their kinetic properties.
Beta-lactamase – mutant generation
Model mutations in the active site of P99 beta-lactamase based on computation.
Construct mutations in the wild-type P99 beta-lactamase plasmid. The actual processes would depend on what the mutations are and how many mutants are to be made.
By synthetic gene method – Work with gene synthesis group to construct synthetic constructs, esp. in how to set up efficient oligonucleotides coverage for all the mutations. Clone all mutant constructs into suitable bacterial expression vector.
By multi-site directed mutagenesis method - Need to obtain mutagenic reagents such as cells, enzymes and primers to generate a set of mutations. Verify mutant production by DNA sequencing of individual clones.
Beta-lactamase – mutant selection
With the bacterial host strains used for selection, make competent cells so that they can take up plasmid DNA.
Transform wild-type P99 beta-lactamase plasmid into host cells as positive control. Transform the mutant plasmids into host cells to select for active constructs.
Make agar plates containing different types of beta-lactam compounds and at different concentration.
Grow bacteria transformed with beta-lactamase plasmids on these plates and monitor colony formation.
Identify the clones with good growth characteristics so they would be the candidates to provide hydrolytic activities on a variety of beta-lactam substrates.
Verify specific mutations by DNA sequencing.
Proceed to protein expression, purification and activity quantitation.
A General Framework for Computationally Directed Biocatalyst DesignA General Framework for Computationally Directed Biocatalyst Design
Catalytic constraint: interatomic distances rij < hbond dist
Catalytic constraint: interatomic distances rij < hbond dist Enzyme-substrate
binding affinity
Enzyme-substratebinding affinity
• Minimize J over sequence space
• Represent dynamical constraint with requirement that total energy of complex minimized for any sequence
• Omits selection pressure for product release
• Minimize J over sequence space
• Represent dynamical constraint with requirement that total energy of complex minimized for any sequence
• Omits selection pressure for product release
slack variableslack variable
1
1 1
2hbond,
N
i
N
jijijijijbind seqrrseqGseqJ
Assessment of active site designability
Need to assess number of sequences that are structurally similar to native
Requires sampling over ligand conformations
1
, hbond10
1 1ln (seq) (seq) (seq)
N N
bind bind opt ijj i i ij
S Z G G r rT T
Two approaches:
Marginal distributions (as shown) using top m (m constant) as shown or setting m_i according to exp(shannon entropy). Choose T based on exptl tractability. Assumes independence, but easier for exptlst to implement out-of-box. Note S in this case cannot be interpreted as number of microstates since LLN does not hold
b) Joint distribution: sample m sequences from joint distribution for specified T’s. S computed based on moments of objectives. Compare D=exp(S) for several T’s, look for transition to region where denser sampling possible (heat capacity analogy). LLN holds, allowing interpretation of designability as relative number of microstates
Computationally directed active site sequence library generation
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
D A F R S N E Y H I L K N G T W V
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
D A F R S N E Y H I L K N G T W V C
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
D A F R S N E Y H I L K N G T W V C
Shannon sequence entropy: Si = - (a=1...20) [f(ia) ln f(ia)]
Computed sequence entropies suggest equilibrium in sequence space
Penicillium sp. -galactosidase
Computed Observed
Catalytic constraint
Comparable Shannon site entropies suggest equilibrium for same fitness measure and provide concise comparison of distributions at all positions (rather than showing pdf at each position)
Marginal active site sequence distributions
Shannon site entropies: Computed based on marginal distributions; unlike joint cannot be expressed in closed form in terms of exp fn. Two approaches to estimating distribution – a) in terms of marginal moments of functions of f_i’s; b) in terms of explicit f_i’s (used here). Both based on drawing m samples from joint
Extensions/modifications to PNAS paper figures:
Better to display K-L relative entropies rather than site entropies for marginal distributions at each position
Instead compare K-L relative entropies (joint distribution) wrt MSAs for models w different objectives, on same plot;
alternatively use approach based on marginal distributions on Shannon entropy slide
0 1 2 3 4 5 6 7 8 9 10
For such figures, compare K-L rel entropies (here marginal)
0
0.1
0.2
0.3
0.4
0.5
0.6
D A F R S Q E Y H I L K N G T W V M
Fre
qu
ency
Plan for development of designability theory and experimental application (to be described in conclusion of our early papers)
Apply designability theory to all major enzyme families from PNAS papers; extend to designability of modified sirtuins experimentally
Could id the catalytic constraints and focus on objective for reducing inhibition (NAM binding affinity); estimate latter temperature.
Compare designability of NAD site to that of other enzyme classes studied, for same T’s. Check designability at lower T for NAM inhibition
Designability approach will help determine viability of drug development efforts more effectively than comb chem
http://tinyurl.com/63gt3lm
Components of energy function
Covalent bond potential
Non-bonding terms (Van Der Waals)
Torsional potential
H-bonding (sometimes)
Electrostatic potential
Surface-area term
H2OH2OH2O
H2OH2O
H2O
H2OH2OH2O
H2OH2O
H2O
H2OH2OH2O
H2OH2O
H2O
H2OH2OH2O
H2OH2O
H2O
H2OH2OH2OH2OH2O
H2OH2O
H2OH2OH2OH2OH2OH2O
H2OH2O
H2O
The effect of water(a rude fellow!)
0
0.2
0.4
0.6
0.8
1
1.2
Phe120 Asn161 Trp233 Arg285 Thr299 Ser326 Ser62 Lys65 Tyr159
Rm
sd t
o n
ativ
e (A
)
Computational active site optimization is structurally accurate to near-crystallographic resolution
Future plans
• Understanding differences between PLOP/Glide/Qsite energies
for summing energy calcs to calculate Km, kcat
• Modeling the denatured state of proteins to estimate folding free energy for core sequence optimization
Integration with other current developments
• Induced fit + Backbone reshaping to start with globally-relaxed backbone shapes for unnatural ligands
• MD treatment of loops + Backbone reshaping + Classical affinity opt for antibody engineering