6
Structure and folding of a designed knotted protein Neil P. King a , Alex W. Jacobitz a , Michael R. Sawaya b , Lukasz Goldschmidt a , and Todd O. Yeates a,c,d,e,1 a Department of Chemistry and Biochemistry, University of California, Los Angeles, CA 90095-1569; b Howard Hughes Medical Institute, University of California, Los Angeles, CA 90095-1570; c UCLA-DOE Institute for Genomics and Proteomics, University of California, Los Angeles, CA 90095-1570; d Molecular Biology Institute, University of California, Los Angeles, CA 90095-1570; and e California Nanosystems Institute, University of California, Los Angeles, CA 90095-1570 Edited* by David Baker, University of Washington, Seattle, WA, and approved September 20, 2010 (received for review June 2, 2010) A very small number of natural proteins have folded configurations in which the polypeptide backbone is knotted. Relatively little is known about the folding energy landscapes of such proteins, or how they have evolved. We explore those questions here by designing a unique knotted protein structure. Biophysical charac- terization and X-ray crystal structure determination show that the designed protein folds to the intended configuration, tying itself in a knot in the process, and that it folds reversibly. The protein folds to its native, knotted configuration approximately 20 times more slowly than a control protein, which was designed to have a similar tertiary structure but to be unknotted. Preliminary kinetic experi- ments suggest a complicated folding mechanism, providing oppor- tunities for further characterization. The findings illustrate a situation where a protein is able to successfully traverse a complex folding energy landscape, though the amino acid sequence of the protein has not been subjected to evolutionary pressure for that ability. The success of the design strategyconnecting two mono- mers of an intertwined homodimer into a single protein chainsupports a model for evolution of knotted structures via gene duplication. Anfinsen energy landscape folding kinetics protein folding topology T housands of distinct protein folds have been observed in nat- ure, yet only a handful possess the property of having a knotted protein backbone (1). These rare cases present intriguing opportunities for studying the mechanisms of protein folding (2). In order to fold into a knot, complex contortions of the protein chain are required. For instance, one part of the protein might have to pass through a loop formed by another part at some point during the folding process, like a thread through the eye of a nee- dle. Before the first deeply knotted protein structure was identi- fied ten years ago (3), the apparent lack of knotted proteins was cited as evidence to suggest that this type of threading event might be impossible (4, 5). However, there are now roughly ten distinct knotted protein folds known (1), some of which are quite deep, proving that some proteins can and do spontaneously fold into knotted structures. Knotted structures present challenges to current theories of protein folding, which have been developed mainly based on small proteins with simple folding kinetics (6). For such proteins, it has been proposed that the folding energy landscape resembles a funnel (79), implying that the native state can be reached by moving toward lower energy from any of a vast ensemble of denatured configurations. The topological constraint of having to thread a knotted protein, on the other hand, would appear to greatly restrict the conformational space available for productive folding, which raises the question of how the folding pathway through this restricted conformational space is encoded in the amino acid sequence of the protein. A general model of protein folding must be able to account for topologically complex pro- teins, such as those containing knots. Much of the recent research on knotted proteins, both experi- mental and computational, has naturally turned to investigating exactly how and when threading occurs during folding. Jackson et al. have carried out a series of experiments to characterize the complex folding pathways of two structurally related, knotted methyltransferases (1013). Work on the methyltransferase mod- el system led to the proposal that threading can occur early in folding reactions (14), producing a knotted protein in a loose, denatured-like state, followed by normal folding to the native structure. A recent demonstration that the methyltransferases tend to remain knotted even under strongly denaturing condi- tions (15) further supports the view that threading occurs early for that particular knotted fold. Computational simulations, on the other hand, have suggested various scenarios for threading, including mechanisms where the knot is acquired in later stages of folding (16, 17). Other important questions posed by the existence of knotted proteins have not yet been addressed experimentally. For in- stance, does a knotted topology have any effect on the stability, folding, or rigidity of a protein? The difficulty in attacking such a question lies in a lack of suitable controls. In order to specifically address the effects of a knot in a protein, a nearly identical, yet unknotted protein must be available for comparison. Because such pairs of knotted/unknotted proteins do not exist naturally, they must be designed or engineered (2). Here we describe the design of a unique knotted protein and its structural and biophysical characterization. Despite having to navigate a presumably complex energy landscape, the protein folds reversibly in vitro to the target, knotted configuration. The engineering of a control protein that is unknotted but other- wise nearly identical in sequence and structure allows the effects of protein knotting to be examined directly. Results Designing a Unique Protein Knot by Domain Duplication. We sought to design a unique knotted protein by minimally modifying a naturally existing, unknotted protein. Our design strategy was motivated by the prior observation that some knotted protein folds display internal pseudosymmetry (1, 3). This phenomenon is seen in five of the ten knotted folds that have been elucidated. The construction of these proteins from internally duplicated mo- tifs or domains suggests that they evolved by gene duplication and fusion, potentially from ancestral proteins that were oligomeric in nature. In one illustrative example, the hypothetical ancestral homodimer and the knotted tandem-domain monomer can be found in extant proteins: the knotted Agrobacterium tumefaciens protein VirC2 is a fusion of two ribbon-helix-helix DNA-binding domains in a configuration resembling the Arc repressor dimer (Fig. 1A). Motivated by the evidence of domain duplication as a naturally occurring, evolutionary pathway for the creation of Author contributions: N.P.K. and T.O.Y. designed research; N.P.K., A.W.J., and M.R.S. performed research; N.P.K., M.R.S., L.G., and T.O.Y. analyzed data; and N.P.K. and T.O.Y. wrote the paper. The authors declare no conflict of interest. *This Direct Submission article had a prearranged editor. Freely available online through the PNAS open access option. Data deposition: The atomic coordinates and structure factors reported in this paper have been deposited in the Protein Data Bank, www.pdb.org. 1 To whom correspondence should be addressed. E-mail: [email protected]. This article contains supporting information online at www.pnas.org/lookup/suppl/ doi:10.1073/pnas.1007602107/-/DCSupplemental. 2073220737 PNAS November 30, 2010 vol. 107 no. 48 www.pnas.org/cgi/doi/10.1073/pnas.1007602107 Downloaded by guest on August 24, 2020

Structure and folding of a designed knotted proteinStructure and folding of a designed knotted protein Neil P. Kinga, Alex W. Jacobitza, Michael R. Sawayab, Lukasz Goldschmidta, and

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Structure and folding of a designed knotted proteinStructure and folding of a designed knotted protein Neil P. Kinga, Alex W. Jacobitza, Michael R. Sawayab, Lukasz Goldschmidta, and

Structure and folding of a designed knotted proteinNeil P. Kinga, Alex W. Jacobitza, Michael R. Sawayab, Lukasz Goldschmidta, and Todd O. Yeatesa,c,d,e,1

aDepartment of Chemistry and Biochemistry, University of California, Los Angeles, CA 90095-1569; bHoward Hughes Medical Institute, University ofCalifornia, Los Angeles, CA 90095-1570; cUCLA-DOE Institute for Genomics and Proteomics, University of California, Los Angeles, CA 90095-1570;dMolecular Biology Institute, University of California, Los Angeles, CA 90095-1570; and eCalifornia Nanosystems Institute, University of California,Los Angeles, CA 90095-1570

Edited* by David Baker, University of Washington, Seattle, WA, and approved September 20, 2010 (received for review June 2, 2010)

Avery small number of natural proteins have folded configurationsin which the polypeptide backbone is knotted. Relatively littleis known about the folding energy landscapes of such proteins,or how they have evolved. We explore those questions here bydesigning a unique knotted protein structure. Biophysical charac-terization and X-ray crystal structure determination show that thedesigned protein folds to the intended configuration, tying itself ina knot in the process, and that it folds reversibly. The protein foldsto its native, knotted configuration approximately 20 times moreslowly than a control protein, whichwas designed to have a similartertiary structure but to be unknotted. Preliminary kinetic experi-ments suggest a complicated folding mechanism, providing oppor-tunities for further characterization. The findings illustrate asituation where a protein is able to successfully traverse a complexfolding energy landscape, though the amino acid sequence of theprotein has not been subjected to evolutionary pressure for thatability. The success of the design strategy—connecting two mono-mers of an intertwined homodimer into a single protein chain—supports a model for evolution of knotted structures via geneduplication.

Anfinsen ∣ energy landscape ∣ folding kinetics ∣ protein folding ∣ topology

Thousands of distinct protein folds have been observed in nat-ure, yet only a handful possess the property of having a

knotted protein backbone (1). These rare cases present intriguingopportunities for studying the mechanisms of protein folding (2).In order to fold into a knot, complex contortions of the proteinchain are required. For instance, one part of the protein mighthave to pass through a loop formed by another part at some pointduring the folding process, like a thread through the eye of a nee-dle. Before the first deeply knotted protein structure was identi-fied ten years ago (3), the apparent lack of knotted proteins wascited as evidence to suggest that this type of threading eventmight be impossible (4, 5). However, there are now roughlyten distinct knotted protein folds known (1), some of which arequite deep, proving that some proteins can and do spontaneouslyfold into knotted structures.

Knotted structures present challenges to current theories ofprotein folding, which have been developed mainly based onsmall proteins with simple folding kinetics (6). For such proteins,it has been proposed that the folding energy landscape resemblesa funnel (7–9), implying that the native state can be reachedby moving toward lower energy from any of a vast ensembleof denatured configurations. The topological constraint of havingto thread a knotted protein, on the other hand, would appear togreatly restrict the conformational space available for productivefolding, which raises the question of how the folding pathwaythrough this restricted conformational space is encoded in theamino acid sequence of the protein. A general model of proteinfolding must be able to account for topologically complex pro-teins, such as those containing knots.

Much of the recent research on knotted proteins, both experi-mental and computational, has naturally turned to investigatingexactly how and when threading occurs during folding. Jacksonet al. have carried out a series of experiments to characterizethe complex folding pathways of two structurally related, knotted

methyltransferases (10–13). Work on the methyltransferase mod-el system led to the proposal that threading can occur early infolding reactions (14), producing a knotted protein in a loose,denatured-like state, followed by normal folding to the nativestructure. A recent demonstration that the methyltransferasestend to remain knotted even under strongly denaturing condi-tions (15) further supports the view that threading occurs earlyfor that particular knotted fold. Computational simulations, onthe other hand, have suggested various scenarios for threading,including mechanisms where the knot is acquired in later stagesof folding (16, 17).

Other important questions posed by the existence of knottedproteins have not yet been addressed experimentally. For in-stance, does a knotted topology have any effect on the stability,folding, or rigidity of a protein? The difficulty in attacking such aquestion lies in a lack of suitable controls. In order to specificallyaddress the effects of a knot in a protein, a nearly identical, yetunknotted protein must be available for comparison. Becausesuch pairs of knotted/unknotted proteins do not exist naturally,they must be designed or engineered (2).

Here we describe the design of a unique knotted protein andits structural and biophysical characterization. Despite having tonavigate a presumably complex energy landscape, the proteinfolds reversibly in vitro to the target, knotted configuration.The engineering of a control protein that is unknotted but other-wise nearly identical in sequence and structure allows the effectsof protein knotting to be examined directly.

ResultsDesigning a Unique Protein Knot by Domain Duplication. We soughtto design a unique knotted protein by minimally modifying anaturally existing, unknotted protein. Our design strategy wasmotivated by the prior observation that some knotted proteinfolds display internal pseudosymmetry (1, 3). This phenomenonis seen in five of the ten knotted folds that have been elucidated.The construction of these proteins from internally duplicated mo-tifs or domains suggests that they evolved by gene duplication andfusion, potentially from ancestral proteins that were oligomeric innature. In one illustrative example, the hypothetical ancestralhomodimer and the knotted tandem-domain monomer can befound in extant proteins: the knotted Agrobacterium tumefaciensprotein VirC2 is a fusion of two ribbon-helix-helix DNA-bindingdomains in a configuration resembling the Arc repressor dimer(Fig. 1A). Motivated by the evidence of domain duplication asa naturally occurring, evolutionary pathway for the creation of

Author contributions: N.P.K. and T.O.Y. designed research; N.P.K., A.W.J., and M.R.S.performed research; N.P.K., M.R.S., L.G., and T.O.Y. analyzed data; and N.P.K. and T.O.Y.wrote the paper.

The authors declare no conflict of interest.

*This Direct Submission article had a prearranged editor.

Freely available online through the PNAS open access option.

Data deposition: The atomic coordinates and structure factors reported in this paper havebeen deposited in the Protein Data Bank, www.pdb.org.1To whom correspondence should be addressed. E-mail: [email protected].

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1007602107/-/DCSupplemental.

20732–20737 ∣ PNAS ∣ November 30, 2010 ∣ vol. 107 ∣ no. 48 www.pnas.org/cgi/doi/10.1073/pnas.1007602107

Dow

nloa

ded

by g

uest

on

Aug

ust 2

4, 2

020

Page 2: Structure and folding of a designed knotted proteinStructure and folding of a designed knotted protein Neil P. Kinga, Alex W. Jacobitza, Michael R. Sawayab, Lukasz Goldschmidta, and

knotted folds, we engineered a unique knotted protein by geneti-cally fusing a tandem repeat of the gene for the unknotted,dimeric protein HP0242 from Helicobacter pylori (PDB codes2ouf and 2bo3; Fig. 1B). The two subunits of this protein ofunknown function (referred to hereafter as 2ouf-wt) intertwinein such a manner that connecting the C terminus of the first sub-unit to the N terminus of the second by a nine-residue, glycine-rich linker yields a knotted, monomeric protein, which we refer toas 2ouf-knot (Fig. 1D). The knot in 2ouf-knot is a left-handedtrefoil (three-crossing) knot, and is 28 residues deep on theN-terminal side and 57 residues deep on the C-terminal side.A third protein, 2ouf-ds, was constructed in which the two chainsof the dimer are linked by an intermolecular disulfide bond thatdoes not introduce a knot (Fig. 1C). Because 2ouf-knot and 2ouf-ds are both unimolecular and nearly identical other than theirdiffering topologies, comparing the genetically fused, knottedprotein to the disulfide-linked, unknotted protein allows for di-rect interrogation of the effects of knotted topologies on proteinstability and folding.

Structural Characterization of 2ouf-ds and 2ouf-knot. ðHisÞ6-taggedconstructs of 2ouf-wt, 2ouf-ds, and 2ouf-knot were expressedrecombinantly in Escherichia coli and purified to homogeneity.Following oxidation of the designed disulfide in 2ouf-ds, the mu-tant proteins were crystallized (see SI Text). Crystal structures of2ouf-ds and 2ouf-knot were determined by molecular replace-ment and refined to respective resolutions of 2.9 Å and 2.3 Å(Table S1). Although the protein molecules are packed differ-ently in these two crystals and the crystals of the wild-type pro-tein, the refined models of all three structures are effectivelysuperimposable. Pairwise alignments of the three structures—thewild-type homodimer, 2ouf-knot, and 2ouf-ds—over backboneatoms covering residues 13–92 in the wild-type sequence yieldrms differences that are less than or equal to 0.74 Å (Fig. 1 Cand D). The designed 2ouf-knot protein is therefore folded asintended into a knotted configuration.

Unfortunately, although not unexpectedly, electron densitywas observed for only one residue of the flexible linker usedto link the two domains of 2ouf-knot (sequence SGSGSGSSG).To verify that the linker was still intact in the crystallized protein,

we subjected washed crystals of 2ouf-knot to nonreducing SDS-PAGE alongside solutions of the purified proteins (Fig. S1A). Wedid not observe cleavage products of the proteins in any lane ofthe gel, confirming that our crystal structure of 2ouf-knot is ofthe intact, full-length, tandem repeat protein.

Biophysical Characterization of 2ouf-wt, 2ouf-ds, and 2ouf-knot.Ana-lytical size exclusion chromatography was used to confirm thatthe designed proteins exhibited the correct quaternary structuresin solution (Fig. 2A). The wild-type protein was previously shownto be a dimer in solution (18). As expected, 2ouf-ds eluted at pre-cisely the same volume as 2ouf-wt, while 2ouf-knot eluted veryslightly later, as expected due to its marginally smaller molecularweight (see SI Text). We next compared the spectroscopic proper-ties of the folded and unfolded states of the proteins using CDand intrinsic fluorescence. The far-UV CD spectra of 2ouf-wt,2ouf-ds, and 2ouf-knot overlap nearly perfectly (Fig. 2B), as dotheir fluorescence spectra (Fig. 2C), indicating that the moleculesare in structurally equivalent states in solution. Unfolding theproteins in high concentrations of guanidinium chloride (GdmCl)resulted in the expected loss of helical signal in their CD spectra,although 2ouf-ds and 2ouf-knot retained slightly more signal than2ouf-wt (Fig. 2B). Similarly, a large decrease in the fluorescenceintensity and a red-shift of the emission maximum of each proteinwas observed in the presence of GdmCl owing to the single,buried tryptophan residue at position 18 (or two pseudosymme-trically related copies of that residue in 2ouf-knot; Fig. 1) becom-ing exposed to solvent upon unfolding (Fig. 2C). Importantly,proteins that were unfolded and subsequently refolded by dilu-tion of the GdmCl exhibited CD and fluorescent spectra nearlyidentical to those obtained under native conditions (Fig. 2 Band C), suggesting that all three proteins, even the syntheticallyknotted 2ouf-knot, unfold reversibly in vitro in the absence ofmolecular chaperones.

Equilibrium Denaturation of 2ouf-wt, 2ouf-ds, and 2ouf-knot. To con-firm the reversibility of unfolding, and to compare the thermody-namic stability of a knotted protein to an unknotted control, wecollected equilibrium denaturation curves of 2ouf-wt, 2ouf-ds,and 2ouf-knot. We used GdmCl as denaturant, and collected data

Fig. 1. Experimental design and crystal structures of 2ouf-ds and 2ouf-knot. (A) A natural example where gene duplication and fusion of an intertwined dimerhas led to a knot. The naturally knotted VirC2 protein (right) is a tandem repeat of ribbon-helix-helix domains, like those found in the dimeric, unknotted Arcrepressor (left). A single ribbon-helix-helix domain in each protein is highlighted in cyan. (B) Crystal structure (left; PDB code 2ouf) and simplified schematic(right) of the intertwined, dimeric protein HP0242. The two chains of the dimer are colored green and blue. (C) Schematic (left) and crystal structure (right) of2ouf-ds. A disulfide bond, engineered to produce a unimolecular but unknotted construct, is shown in orange. (D) Schematic (left) and crystal structure (right)of 2ouf-knot, a knotted protein created by linking the two intertwined domains in tandem. The eight linker residues not modeled in the crystal structure arerepresented by the dashed gray line. Alignments of one subunit of the wild-type protein (blue) to the crystal structures of (C) 2ouf-ds and (D) 2ouf-knot areshown. In all structures, tryptophans 18 and 18′ (18 and 107 in 2ouf-knot) are shown as sticks.

King et al. PNAS ∣ November 30, 2010 ∣ vol. 107 ∣ no. 48 ∣ 20733

BIOPH

YSICSAND

COMPU

TATIONALBIOLO

GY

Dow

nloa

ded

by g

uest

on

Aug

ust 2

4, 2

020

Page 3: Structure and folding of a designed knotted proteinStructure and folding of a designed knotted protein Neil P. Kinga, Alex W. Jacobitza, Michael R. Sawayab, Lukasz Goldschmidta, and

at low (2.5 μM 2ouf-wt/ds, 1.25 μM 2ouf-knot) and high (100 μM2ouf-wt/ds, 50 μM 2ouf-knot) protein concentration at 25 °C. Theunfolding and refolding curves for each protein were superimpo-sable whether fluorescence or CD was used as a probe, demon-strating that unfolding is reversible for all three proteins (Fig. 3).However, the two spectroscopic probes yielded substantiallydifferent curves for each protein. When monitoring fluorescenceemission at 320 nm at low protein concentration, a single transi-tion is observed for all three proteins, with midpoints of denatura-tion between 2–3 M GdmCl. The CD curves at low proteinconcentration, in contrast, contain two readily identifiable transi-tions, the first between 2–3 M GdmCl, and the second between5–7 M GdmCl. For 2ouf-wt, the higher [GdmCl] transitionbecame more pronounced at higher protein concentration, whilethe lower [GdmCl] transition was largely unchanged (Fig. 3A).The curves at low and high protein concentration were superim-posable for 2ouf-ds and 2ouf-knot (Fig. 3 B and C), consistentwith the unimolecular nature of these two proteins.

Together, the equilibrium data indicate that the higher[GdmCl] transition observed by CD for all three proteins is dueto an equilibrium intermediate populated at moderate [GdmCl],and that the 2ouf-wt intermediate is dimeric, because it is morepopulated at higher protein concentration. The analogous 2ouf-ds and 2ouf-knot intermediates are highly populated even at lowprotein concentration; in these proteins the two domains formingthe intermediates are covalently linked, so their stabilities areindependent of protein concentration. Finally, the intermediates

for all three proteins appear to be structurally similar, each con-taining a significant amount of secondary structure (moderateCD signal), but without the tryptophan residues in their buried,native states (diminished fluorescent signal). We also monitoredthe equilibrium denaturation of 2ouf-wt at 100 μM protein byfluorescence, and found it to be indistinguishable from the curvecollected at 2.5 μM protein (Fig. 3A).

We fitted the CD equilibrium denaturation data at low proteinconcentration for 2ouf-ds and 2ouf-knot to a three state N ↔I ↔ U model (as described in SI Text) to extract thermodynamicparameters (Table 1). For 2ouf-wt, we fitted the low and highprotein concentration CD data individually to a three state N2 ↔I2 ↔ 2U dimer denaturation model with a dimeric intermediateas described previously (19), with minor modification (as de-scribed in SI Text). We independently fitted the fluorescence datafor all proteins to a two-state N ↔ I model (N2 ↔ I2 for 2ouf-wt)because the intermediates exhibit no detectable fluorescentsignal. For each protein, the parameters extracted from the CDand fluorescence data for the native to intermediate transitionwere in close agreement. Unexpectedly, the 2ouf-knot intermedi-ate is significantly (5.8 kcalmol−1) more stable than that of 2ouf-ds, whereas the native state of 2ouf-knot ismore stable than that of2ouf-ds by a lesser amount (2.7 kcalmol−1). In comparison, there-fore the stability of the native state of 2ouf-knot relative to theintermediate state (2.3 kcalmol−1) is partially diminished com-pared to 2ouf-ds (5.4 kcalmol−1). This observation could reflectthe entropic cost of constraining the nine-residue, glycine-rich

Fig. 2. Biophysical characterization of 2ouf-wt, 2ouf-ds, and 2ouf-knot. Data for 2ouf-wt are colored black; 2ouf-ds, red; and 2ouf-knot, blue. (A) Sizeexclusion chromatography elution profiles of the three proteins. The elution profiles for 2ouf-wt and 2ouf-ds are nearly indistinguishable. (B) Far-UV CDspectra of the three proteins. Spectra for the folded proteins (in buffer supplemented with 0.25 M GdmCl) are shown as solid lines, unfolded proteins(7.44 M GdmCl) as dash-dot lines, and proteins refolded from 6.14 M GdmCl (0.25 M GdmCl) as dashed lines. (C) Fluorescence emission spectra of the threeproteins, using an excitation wavelength of 290 nm. Spectra are represented as in B, and have been normalized for comparison.

Fig. 3. Equilibrium denaturation of (A) 2ouf-wt, (B) 2ouf-ds, and (C) 2ouf-knot. CD data are represented by filled symbols, fluorescence data by open symbols.All data have been normalized to a native signal of 0 and a denatured signal of 1. Circles and squares represent unfolding and refolding data, respectively, atlow protein concentration. Triangles represent unfolding data at high protein concentration. Lines represent fits of the CD (solid) and fluorescence (dashed)data to three-state and two-state models, respectively.

20734 ∣ www.pnas.org/cgi/doi/10.1073/pnas.1007602107 King et al.

Dow

nloa

ded

by g

uest

on

Aug

ust 2

4, 2

020

Page 4: Structure and folding of a designed knotted proteinStructure and folding of a designed knotted protein Neil P. Kinga, Alex W. Jacobitza, Michael R. Sawayab, Lukasz Goldschmidta, and

linker in a relatively low-entropy conformation (20). Alternatively,the intermediate state of 2ouf-knot may be more native-like thanthat of 2ouf-ds, as suggested by its significantly enhanced stabilityand much lowerm-value for the native to intermediate transition.

Folding Kinetics of 2ouf-wt, 2ouf-ds, and 2ouf-knot.We collected andanalyzed stopped-flow kinetic traces of 2ouf-wt, 2ouf-ds, and2ouf-knot refolding to examine the effects of knotted topologieson protein folding rates. Monitoring single-jump refolding fromthe unfolded state (7.29 M GdmCl) by fluorescence emission at320 nm yielded kinetic traces that reached completion on thetime scale of several seconds (Fig. 4). However, a significant frac-tion of the total signal change between the unfolded and foldedstates was complete in less than the dead time of the instrument(∼15 ms) for all three proteins, suggesting the presence of a burstphase intermediate during folding. Nevertheless, it is clear fromvisual inspection of the kinetic traces that the knotted 2ouf-knotfolds much more slowly than the unknotted 2ouf-ds. Numericalanalysis of the refolding traces revealed that the observable signalchange for each protein could be adequately described by a sumof two exponentials (Fig. 4, residuals). However, refolding toseveral final concentrations of GdmCl (Fig. S2) and plotting thenatural logs of rate constants extracted from either single or dou-ble exponential fits as a function of [GdmCl] (chevron analysis)did not result in linear folding limbs as typically observed(Fig. S3); severe rollover is observed at low [GdmCl]. This anom-alous behavior indicates that the refolding reactions are toocomplex to be explained by a simple biexponential function.Given the stable intermediates observed at equilibrium, thepresence of an apparent burst phase in the kinetic refoldingdata, and the rollover observed at low [GdmCl], it seems probablethat the proteins are folding through one or more intermediatespecies or pathways.

Nevertheless, the appearance of fluorescent signal in therefolding traces clearly monitors attainment of native structureduring refolding of the proteins, and therefore allows a quantita-tive comparison of their folding rates. Fitting the kinetic data toa single exponential function, while not capturing fine details ofthe kinetic traces, provides a reasonable assessment of the overallfolding efficiencies (Fig. S2). Comparing these estimated ratesfor the three proteins reveals that, at a final [GdmCl] of 0.66 M,2ouf-knot folds approximately 20 times more slowly (0.9 s−1) thanthe unknotted 2ouf-ds designed as a control (17.7 s−1) and twice asslowly as the dimeric 2ouf-wt (1.9 s−1; the units here reflect thatthe observed phenomenon monitors the appearance of N2, whichis first-order). The folding rates of 2ouf-ds and 2ouf-knot werefound to be independent of protein concentration, while 2ouf-wt refolds faster as protein concentration increases, as expectedfor a bimolecular reaction (Fig. 4, insets).

Refolding kinetics were also analyzed starting with proteinsequilibrated in 4 M GdmCl, conditions under which the equili-

brium intermediate of each protein is well populated. 2ouf-knotwas found to refold rapidly from this state, yielding estimatedfolding rates comparable to those of 2ouf-ds under the sameconditions (Figs. S3 and S4B), in sharp contrast to the muchslower rate of overall folding for 2ouf-knot beginning from thefully denatured state (Figs. S3 and S4A). This result suggests thatthe intermediate observed at equilibrium may be populated as akinetic intermediate on the folding pathway of 2ouf-knot, andthat the slower overall folding rate of 2ouf-knot when refoldingis initiated from the unfolded state can be attributed to the transi-tion from the unfolded to the intermediate state.

We also monitored unfolding reactions starting with theproteins in their native states, and observed that at high [GdmCl]the fluorescent signal was lost very rapidly (<100 ms) for all threeproteins (Fig. S2). Unfolding traces for 2ouf-wt and 2ouf-ds wereadequately described by a single exponential function, whilethose for 2ouf-knot required two exponential terms, except at thehighest final [GdmCl] tested (6 M). As with the folding data, itwas found that single exponential fits could reasonably approxi-mate the apparent unfolding rate of 2ouf-knot. The natural logsof estimated unfolding rates extracted from the single exponen-tial fits for each protein were found to depend linearly on thefinal [GdmCl] of the unfolding reaction (Fig. S3). This analysisrevealed that although the apparent unfolding rates observedfor 2ouf-knot are similar to the rates of refolding from the inter-mediate state, they are much higher than the overall rates ofrefolding from the fully unfolded state. Therefore, fluorescenceappears to monitor a reversible, relatively rapid transition be-tween the native and intermediate states of 2ouf-knot. Further-more, although unfolding from the intermediate to the fullyunfolded state is invisible when monitoring by fluorescence, itappears to be the rate-limiting step of 2ouf-knot unfolding basedon the above kinetic and equilibrium data.

The folding and unfolding data are most consistent with amodel in which 2ouf-knot folds through a kinetic intermediate,similar in nature to the stable intermediate observed at equili-brium, that is also populated on the unfolding pathway. Further-more, the data suggest that a higher activation energy barrierexists between the intermediate and the fully unfolded statesof 2ouf-knot compared to 2ouf-ds during folding and, most likely,also during unfolding. We propose that this higher barrier isrelated to knot formation and unthreading, respectively, becausethe key difference between 2ouf-knot and 2ouf-ds is the presenceor absence of the knot, although, as discussed below, other factorsarising from differences in the way the two domains are con-nected in each protein may also be partially responsible.

Computational Search for Other Potential Domain Fusion Knots. Toour knowledge, there has not been a systematic search for dimericproteins that would become knotted upon fusion of the twosubunits. We used the PISA server (21) to download a set of PDB

Table 1. Thermodynamic parameters for 2ouf-wt, 2ouf-ds, and 2ouf-knot, extracted from fits of equilibrium denaturation data

Protein Probe Y I ΔG°H2ON2↔I2 mN2↔I2 ΔG°H2O

I2↔2U mI2↔2U ΔG°H2ON2↔2U* mN2↔2U

2ouf-wt CD 0.5 4.6 ± 0.3 1.71 ± 0.11 13.4 ± 0.3 1.22 ± 0.06 18.0 ± 0.4 2.93 ± 0.13CD‡ 0.5 4.9 ± 0.6 1.83 ± 0.21 13.5 ± 0.7 1.45 ± 0.13 18.4 ± 0.9 3.29 ± 0.24Fluor. - 4.3 ± 0.2 1.74 ± 0.09 - - - -

ΔG°H2ON↔I mN↔I ΔG°H2O

I↔U mI↔U ΔG°H2ON↔U† mN↔U

2ouf-ds CD 0.59 ± 0.02 5.4 ± 0.4 1.80 ± 0.14 7.1 ± 0.8 1.14 ± 0.12 12.5 ± 0.9 2.94 ± 0.18Fluor. - 5.3 ± 0.4 1.90 ± 0.13 - - - -

2ouf-knot CD 0.52 ± 0.01 2.3 ± 0.1 0.96 ± 0.04 12.9 ± 0.6 1.99 ± 0.10 15.2 ± 0.6 2.95 ± 0.10Fluor. - 2.1 ± 0.1 1.05 ± 0.03 - - - -

The parameter Y I refers to the CD signal amplitude of the intermediate state of each protein. All ΔG° values are in units of kcalmol−1; the standardstate concentration (which enters into the wild-type analysis) is 1 M. Allm values are in units of kcalmol−1 M−1; errors quoted are the standard errors forthe fits calculated by the program R.*ΔG°H2O

N2↔2U ¼ ΔG°H2ON2↔I2 þ ΔG°H2O

I2↔2U; ΔG°H2ON↔U ¼ ΔG°H2O

N↔I þ ΔG°H2OI↔U

†mN2↔2U ¼ mN2↔I2 þmI2↔2U; mN↔U ¼ mN↔I þmI↔U.‡100 μM protein. All other parameters are from fits to low protein concentration curves.

King et al. PNAS ∣ November 30, 2010 ∣ vol. 107 ∣ no. 48 ∣ 20735

BIOPH

YSICSAND

COMPU

TATIONALBIOLO

GY

Dow

nloa

ded

by g

uest

on

Aug

ust 2

4, 2

020

Page 5: Structure and folding of a designed knotted proteinStructure and folding of a designed knotted protein Neil P. Kinga, Alex W. Jacobitza, Michael R. Sawayab, Lukasz Goldschmidta, and

files containing predicted homodimeric assemblies of small pro-teins (<200 residues), computationally connected their termini,and evaluated them for knottedness. Out of 4,192 homodimerssearched, we found only five distinct, globular, dimeric proteinfolds that could be knotted by fusion of the two subunits

(Table S2 and Fig. S5). In addition to HP0242, the subject of thepresent study, our search also identified several dimeric ribbon-helix-helix proteins, as expected given the topology of the natu-rally knotted VirC2, which is a tandem duplication of ribbon-helix-helix domains as discussed above. The final three dimericfolds revealed by our search were the YejL-like domain, theOsmC-like domain, and a lambda-repressor-like DNA-bindingdomain with intertwined C-terminal extensions. These proteinscould provide additional targets for designing unique knottedproteins by tandem duplication. We considered whether proteinscomprising tandem duplications of these domains (which couldbe knotted) might already exist in nature, but initial sequencesearches did not identify any such cases (see SI Text).

DiscussionWe have found that a designed knotted protein, 2ouf-knot, suc-cessfully folds to the target knotted configuration, demonstratingthat a protein sequence can overcome substantial topologicalbarriers on the way to reaching its minimum free energy structureeven when it has not evolved to do so. However, consistent withthe topological problems associated with knotting, we find thatour designed protein knot has a more complex folding energylandscape than an unknotted control protein. This conclusionis supported by a recent study in which dimeric variants of thep53 tetramerization domain were engineered such that threadingof one linear chain of the dimer through a cyclized second chaincould be specifically monitored. It was found that the threadeddimer folded about an order of magnitude more slowly thanthe wild-type protein, in which threading is not required duringfolding since both chains of the dimer are linear (22). Our obser-vation of a complex energy landscape for 2ouf-knot is also consis-tent with recent experimental investigations of the foldingpathways of naturally knotted proteins (11, 23); all three of theproteins studied have been found to fold very slowly, with complexkinetic behaviors involving multiple intermediates. The lack ofunknotted controls for those proteins has prevented direct assess-ment of the role played by their knotted topologies. Furtherkinetic analysis of model systems of the type we present hereshould allow additional insights into the roles of knots in proteins.

Interestingly, 2ouf-knot may not be the first case where thedesign of a tandem repeat protein led to a knotted structure.Nearly 15 years ago, to study the effects of linking dimeric proteinsinto single chains, Robinson and Sauer genetically fused the twochains of the Arc repressor dimer (Fig. 1A), resulting in a proteinthey dubbed Arc-L1-Arc (24). Although we cannot be certainbecause the crystal structure ofArc-L1-Arc was never determined,it is likely that the protein was in fact knotted, though this wasapparently not recognized. A retrospective evaluation of the bio-physical characterization of Arc-L1-Arc suggests that, in contrastto the present study, the putative knot, if present, had little effecton the complexity of the folding pathway of the protein; Arc-L1-Arc exhibited two-state behavior and folded and unfoldedmore quickly than both wild-type Arc (24) and an unknotted, dis-ulfide-linked Arc dimer (25) at all concentrations of denaturant.Those results suggest that the effects of knots on the foldingenergy landscapes of proteins could depend on the particularprotein fold.

We cannot conclusively identify the specific mechanistic eventsthat lead to complexities in the folding energy landscape for 2ouf-knot. However, folding simulations of knotted proteins havegiven rise to various proposed mechanisms for threading duringprotein folding, each of which involves complex movements thatwould be expected to result in entropic barriers to folding andconstricted energy landscapes. For instance, simulations fromtwo groups have revealed that folding can proceed through “slip-knotted” intermediates, in which the portion of the protein chainbeing threaded is initially in a hairpin-like conformation that goesthrough and then comes back out of the threading loop (16, 17,

Fig. 4. Single-jump refolding kinetic traces of (A) 2ouf-wt, (B) 2ouf-ds, and(C) 2ouf-knot. For each protein, the data are normalized to a native signal of1 and a denatured signal of 0. Residuals are for fits to first-order equationswith two (top) or one (bottom) exponential terms. The double exponentialfits are shown as red lines. Insets show the protein concentration dependenceof the rate constants extracted from single (filled circles) and double (opensquares and triangles) exponential fits.

20736 ∣ www.pnas.org/cgi/doi/10.1073/pnas.1007602107 King et al.

Dow

nloa

ded

by g

uest

on

Aug

ust 2

4, 2

020

Page 6: Structure and folding of a designed knotted proteinStructure and folding of a designed knotted protein Neil P. Kinga, Alex W. Jacobitza, Michael R. Sawayab, Lukasz Goldschmidta, and

26). A somewhat different mechanism involving the “flipping” oflarge segments of the protein over a semifolded core has alsobeen proposed (1).

Alternatively, slow migration of the knot along the proteinchain after an initial collapse of the protein during folding maybe responsible for the slower folding rate of 2ouf-knot. Collapseof a “prethreaded” denatured state could give rise to a conforma-tion bearing a knot in a nonnative location. The ensuing steps infolding would involve migration of the knot along the threadedportion of the polypeptide chain. Interactions between thethreaded portion of the chain and the collapsed loop surroundingit, amounting to internal friction in the folding molecule, couldresult in a rugged energy landscape for 2ouf-knot. Simulationson slipknotted structures support the potential significance ofchain friction (27), and the recent experimental demonstrationof a rough energy landscape arising from internal friction (28),in that case due to helices pairing in nonnative registers on theway to the native state, shows that such effects can be importantin protein folding reactions. This mechanism may account for theslow, complex folding of the knotted methyltransferases studiedby Jackson et al., for which initial threading has been suggestedto not be a kinetically limiting step (12, 15).

In addition, when considering explanations for slow folding,we note that a significant correlation has been observed betweencontact order and folding rate for a number of small proteins withsimple folding kinetics, providing a link between the native struc-tures of proteins and their mechanisms of folding (9, 29). Wecalculated the absolute contact order of 2ouf-ds, counting thesequence separation of intersubunit contacts through the disulfidebond (30), to be 10.8, while the absolute contact order of 2ouf-knot was calculated to be 20.9. These values indicate that contact-ing residues in 2ouf-ds are separated by approximately 11 residueson average, while the same contacts in 2ouf-knot are separated byabout 21 residues on average. This large difference is consistentwith a natural tendency for proteins that have complex topologiesto have high contact order. It is probable that the higher fractionof nonlocal contacts in 2ouf-knot compared to 2ouf-ds canexplain, at least in part, the slower folding of the knotted protein.To the extent that knotted proteins must have high contact order,this presents a potentially important complication for knottedproteins in general. However, it is important to note that 2ouf-knot folds to its native state evenmore slowly at low [GdmCl] than2ouf-wt, which, on account of being dimeric, has an effectively

infinite contact order. Evidently the nonlocality of contacts alonecannot fully explain the slower folding of 2ouf-knot.

In summary, our results suggest that although there is no insur-mountable barrier to threading during protein folding, knottedproteins have more complex or constricted folding energy land-scapes than unknotted proteins with similar tertiary structures.A common view in the protein folding community is that mostproteins are subject to selective pressure to fold cooperatively,without highly populated intermediate states (6). This selectivepressure is thought to arise from highly populated nonnative spe-cies having an increased risk of misfolding or aggregating, whichcould deleteriously affect the health of the cell (31). Our resultsimply that knotted proteins possess complex folding landscapes,leading to increased folding times and populated intermediatesthat could be selected against during evolution. It is interesting,however, that although 2ouf-knot folds 20 times more slowly thanthe unknotted control, it still folds within a few seconds, which isfaster than some small proteins with simple folding kinetics (6).Apparently, despite the topological complications, some knottedstructures are able to fold quickly and cooperatively enoughtominimize deleterious misfolding and aggregation events. None-theless, it is likely that for many potentially knotted tertiary struc-tures the landscape is sufficiently complex or constricted toprovide a strong disadvantage, which could explain the apparentrarity of knotted folds in nature.

Materials and MethodsProteins were expressed recombinantly in E. coli and purified bymetal affinitychromatography and size exclusion chromatography. Crystals of 2ouf-ds and2ouf-knot were obtained using the hanging drop vapor diffusion method.Crystal structures of the proteins were determined by molecular replacementusing X-ray diffraction data collected in-house and at the Advanced PhotonSource beamline 24-ID-C. An analysis of pseudosymmetry from the diffractionintensities is illustrated in Fig. S6. CD and fluorescence spectroscopy were per-formed in buffers containing varying amounts of GdmCl to perturb theenergy landscapes of the proteins. The kinetics of folding and unfoldingreactions were monitored using a stopped-flow device equipped with excita-tion and emission monochromators set to measure native tryptophan fluor-escence. An endpoint amplitude analysis is shown in Fig. S7. Detailedmethodscan be found in the SI Text.

ACKNOWLEDGMENTS. The authors thank Katelyn Connell and SusanMarquseefor assistance with folding experiments, Martin Phillips for assistance withstopped-flow fluorimetry, Inna Pashkov for technical assistance, and SophieJackson for helpful comments on the manuscript. This work was supportedby award R01GM081652 from the National Institutes of Health.

1. Bölinger D, et al. (2010) A Stevedore’s protein knot. PLoS Comput Biol 6:e1000731.2. Yeates TO, Norcross TS, King NP (2007) Knotted and topologically complex proteins as

models for studying folding and stability. Curr Opin Chem Biol 11:595–603.3. Taylor WR (2000) A deeply knotted protein structure and how it might fold. Nature

406:916–919.4. Mansfield ML (1994) Are there knots in proteins? Nat Struct Biol 1:213–214.5. Mansfield ML (1997) Fit to be tied. Nat Struct Biol 4:166–167.6. Jackson SE (1998) How do small single-domain proteins fold? Fold Des 3:R81–91.7. Leopold PE, Montal M, Onuchic JN (1992) Protein folding funnels: a kinetic approach

to the sequence-structure relationship. Proc Natl Acad Sci USA 89:8721–8725.8. Dill KA, Chan HS (1997) From Levinthal to pathways to funnels.Nat Struct Biol 4:10–19.9. Wolynes PG (2005) Recent successes of the energy landscape theory of protein folding

and function. Q Rev Biophys 38:405–410.10. Mallam AL, Jackson SE (2006) Probing nature’s knots: the folding pathway of a

knotted homodimeric protein. J Mol Biol 359:1420–1436.11. Mallam AL, Jackson SE (2007) A comparison of the folding of two knotted proteins:

YbeA and YibK. J Mol Biol 366:650–665.12. Mallam AL, Onuoha SC, Grossmann JG, Jackson SE (2008) Knotted fusion proteins

reveal unexpected possibilities in protein folding. Mol Cell 30:642–648.13. Mallam AL, Morris ER, Jackson SE (2008) Exploring knotting mechanisms in protein

folding. Proc Natl Acad Sci USA 105:18740–18745.14. Mallam AL (2009) How does a knotted protein fold? FEBS J 276:365–375.15. Mallam AL, Rogers JM, Jackson SE (2010) Experimental detection of knotted confor-

mations in denatured proteins. Proc Natl Acad Sci USA 107:8189–8194.16. Wallin S, Zeldovich KB, Shakhnovich EI (2007) The folding mechanics of a knotted

protein. J Mol Biol 368:884–893.17. Sułkowska JI, Sułkowski P, Onuchic J (2009) Dodging the crisis of folding proteins with

knots. Proc Natl Acad Sci USA 106:3119–3124.

18. Tsai JY, et al. (2006) Crystal structure of HP0242, a hypothetical protein from Helico-bacter pylori with a novel fold. Proteins 62:1138–1143.

19. Mallam AL, Jackson SE (2005) Folding studies on a knotted protein. J Mol Biol346:1409–1421.

20. Robinson CR, Sauer RT (1998) Optimizing the stability of single-chain proteins by linkerlength and composition mutagenesis. Proc Natl Acad Sci USA 95:5929–5934.

21. Krissinel E, Henrick K (2007) Inference of macromolecular assemblies from crystallinestate. J Mol Biol 372:774–797.

22. Blankenship JW, Dawson PE (2007) Threading a peptide through a peptide: proteinloops, rotaxanes, and knots. Protein Sci 16:1249–1256.

23. Andersson FI, Pina DG, Mallam AL, Blaser G, Jackson SE (2009) Untangling the foldingmechanism of the 5(2)-knotted protein UCH-L3. FEBS J 276:2625–2635.

24. Robinson CR, Sauer RT (1996) Equilibrium stability and sub-millisecond refolding of adesigned single-chain Arc repressor. Biochemistry 35:13878–13884.

25. Robinson CR, Sauer RT (2000) Striking stabilization of Arc repressor by an engineereddisulfide bond. Biochemistry 39:12494–12502.

26. King NP, Yeates EO, Yeates TO (2007) Identification of rare slipknots in proteins andtheir implications for stability and folding. J Mol Biol 373:153–166.

27. Sułkowska JI, Sułkowski P, Onuchic JN (2009) Jamming proteins with slipknots andtheir free energy landscape. Phys Rev Lett 103:268103.

28. Wensley BG, et al. (2010) Experimental evidence for a frustrated energy landscape in athree-helix-bundle protein family. Nature 463:685–688.

29. Plaxco KW, Simons KT, Baker D (1998) Contact order, transition state placement andthe refolding rates of single domain proteins. J Mol Biol 277:985–994.

30. Parrini C, et al. (2008) The folding process of acylphosphatase from Escherichia coli isremarkably accelerated by the presence of a disulfide bond. J Mol Biol 379:1107–1118.

31. Dobson CM (2003) Protein folding and misfolding. Nature 426:884–890.

King et al. PNAS ∣ November 30, 2010 ∣ vol. 107 ∣ no. 48 ∣ 20737

BIOPH

YSICSAND

COMPU

TATIONALBIOLO

GY

Dow

nloa

ded

by g

uest

on

Aug

ust 2

4, 2

020