6
BIOPHYSICS AND COMPUTATIONAL BIOLOGY APPLIED MATHEMATICS Topological knots and links in proteins Pawel Dabrowski-Tumanski a,b and Joanna I. Sulkowska a,b,1 a Faculty of Chemistry, University of Warsaw, 02-093, Warsaw, Poland; and b Centre of New Technologies, University of Warsaw, 02-097, Warsaw, Poland Edited by George H. Lorimer, University of Maryland, College Park, MD, and approved February 1, 2017 (received for review September 23, 2016) Twenty years after their discovery, knots in proteins are now quite well understood. They are believed to be functionally advanta- geous and provide extra stability to protein chains. In this work, we go one step further and search for links—entangled structures, more complex than knots, which consist of several components. We derive conditions that proteins need to meet to be able to form links. We search through the entire Protein Data Bank and identify several sequentially nonhomologous chains that form a Hopf link and a Solomon link. We relate topological properties of these proteins to their function and stability and show that the link topology is characteristic of eukaryotes only. We also explain how the presence of links affects the folding pathways of pro- teins. Finally, we define necessary conditions to form Borromean rings in proteins and show that no structure in the Protein Data Bank forms a link of this type. folding | catenanes | slipknot | lasso | disulphide bridge K notted proteins have been identified in all kingdoms of life, in organisms separated even by 1 billion years of evolution (1–5). High conservation (5) of knotted motifs and their loca- tion (usually) in enzymatic active sites indicates that knots are crucial for protein function. Over 1,300 knotted or slipknotted (shoelace-type) structures, including the trefoil (31), figure-eight (41), three-twist (52), and Stevedore’s (61) knots (4, 6, 7), have been deposited in the Protein Data Bank (PDB) to date accord- ing to KnotProt (8). Mathematically, a knot is defined as an embedding of a cir- cle into a 3D space. A link is a generalization of a knot, defined as an embedding of a finite set of circles. The simplest examples of links are, e.g., the Hopf link and the Solomon link (Fig. 1, Center). Links have been found in DNA (9, 10) and have been synthesized in template synthesis (11, 12). In proteins, the first attempts to identify links were made by Mislow (13, 14). In his approach, however, the link-forming loops were defined either by including interaction with a metal ion (noncovalent loop) or by at least two disulfide bonds for each (covalent) loop. The links formed by covalent loops, each closed by one disulfide bridge only, were considered “unlikely to lead to knots or links” (ref. 14, p. 4,202) by Mislow and therefore hardly examined. More- over, all of the structures were scanned only by “visual exami- nation of their 3D structures” (ref. 14, p. 4,202). To date, the only known simple protein links are designed p53 protein cate- nanes (15) (with the backbones of both chains artificially closed, forming linked loops) and a thermophilic two-chain complex (16) (with linked loops formed by the backbones closed via disul- fide bridges). However, the discovery of a wide class of com- plex lasso proteins (17, 18), in which a chain pierces a cova- lently closed loop (Fig. 1), opens a unique possibility of defining and identifying links. Such links (or more formally, pretzelanes) are defined using covalent loops closed by disulfide bridges in a single-protein chain (compare Fig. 1). Therefore, 20 years after the discovery of knotted proteins, it is time to reformulate Mans- field’s (1) question and ask, Are there links in proteins? In this paper, we propose a general method to identify and clas- sify links in proteins and discuss their biological role. The exis- tence of proteins with stable links changes our view on the com- plexity of proteins and leads to many intriguing questions never asked before: Are links conserved evolutionarily to provide unique features of proteins? Do they exist in all kingdoms? How do they fold? In this paper, we answer these questions, and, in addition, we find relations between proteins with links based on comparing their evolutionary, sequential, and functional properties. Search for Links To identify stable links in proteins, we analyzed their structure and used the method of spanning the (triangulated) minimal surface (17, 18). A segment of a protein chain forms a cova- lent loop if the ends of the segment are connected by a cova- lent bond (e.g., disulfide bridges). Such a covalent loop can be pierced by a protein tail, thereby forming a complex lasso struc- ture (17) (Fig. 2). A link is formed when the piercing tail is itself a part of another covalent loop. To identify links and their types, we analyze sequential numbers (indexes) of loop-forming cysteines and piercing residues (Fig. 2). The indexes of the cys- teines are known from the protein structure, whereas indexes of loop-piercing residues can be determined from minimal surface analysis used in the classification of lasso proteins (17, 18). This method is general and can be applied to various intramolecular contacts. Using the above method, we performed a comprehensive sur- vey of the more than 115,000 chains deposited in the PDB as of May 2016, taking into account all known covalent interactions (e.g., cysteine, amide, ester, thioester, or carbon–carbon bonds). We found that links are formed in as many as 159 structures, of which 129 form the Hopf link and 35 form the Solomon link. The classification of these proteins according to their topological complexity, sequence similarity, and biological function is shown in Datasets S1 and S2, and exemplary linked protein are shown in Fig. 1. In what follows we discuss conclusions that follow from this review. Conservation of Links and Artificial Structures To investigate the structural importance of links in proteins, we analyzed their conservation in clusters of 30% sequential homol- ogy. We found that links are strictly conserved for all homologs (representative structures are presented in SI Appendix, Figs. S1– S3). The nonconservation of topology in a homology cluster can be therefore viewed as a trace of a structure failure. Indeed, Significance Twenty years after a discovery of knotted proteins, we found that some single-protein chains can form links, which have even more complex structures than knots. We derive condi- tions that proteins need to meet to form links. We search through the entire Protein Data Bank and identify several chains that form a Hopf link and a Solomon link. The link motif has not been recognized before; however, it is clearly of important functional significance in proteins. In this article, we relate topological properties of proteins with links to their function and stability and show that the link topology is char- acteristic of eukaryotes only. Author contributions: P.D.-T. and J.I.S. designed research; P.D.-T. performed research; P.D.-T. and J.I.S. analyzed data; and P.D.-T. and J.I.S. wrote the paper. The authors declare no conflict of interest. This article is a PNAS Direct Submission. Freely available online through the PNAS open access option. 1 To whom correspondence should be addressed. Email: [email protected]. This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. 1073/pnas.1615862114/-/DCSupplemental. www.pnas.org/cgi/doi/10.1073/pnas.1615862114 PNAS | March 28, 2017 | vol. 114 | no. 13 | 3415–3420 Downloaded by guest on June 5, 2020

Topological knots and links in proteins - PNAS · Topological knots and links in proteins Pawel Dabrowski-Tumanskia,b and Joanna I. Sulkowskaa,b,1 aFaculty of Chemistry, University

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Topological knots and links in proteins - PNAS · Topological knots and links in proteins Pawel Dabrowski-Tumanskia,b and Joanna I. Sulkowskaa,b,1 aFaculty of Chemistry, University

BIO

PHYS

ICS

AN

DCO

MPU

TATI

ON

AL

BIO

LOG

YA

PPLI

EDM

ATH

EMA

TICS

Topological knots and links in proteinsPawel Dabrowski-Tumanskia,b and Joanna I. Sulkowskaa,b,1

aFaculty of Chemistry, University of Warsaw, 02-093, Warsaw, Poland; and bCentre of New Technologies, University of Warsaw, 02-097, Warsaw, Poland

Edited by George H. Lorimer, University of Maryland, College Park, MD, and approved February 1, 2017 (received for review September 23, 2016)

Twenty years after their discovery, knots in proteins are now quitewell understood. They are believed to be functionally advanta-geous and provide extra stability to protein chains. In this work,we go one step further and search for links—entangled structures,more complex than knots, which consist of several components.We derive conditions that proteins need to meet to be able toform links. We search through the entire Protein Data Bank andidentify several sequentially nonhomologous chains that form aHopf link and a Solomon link. We relate topological properties ofthese proteins to their function and stability and show that thelink topology is characteristic of eukaryotes only. We also explainhow the presence of links affects the folding pathways of pro-teins. Finally, we define necessary conditions to form Borromeanrings in proteins and show that no structure in the Protein DataBank forms a link of this type.

folding | catenanes | slipknot | lasso | disulphide bridge

Knotted proteins have been identified in all kingdoms of life,in organisms separated even by 1 billion years of evolution

(1–5). High conservation (5) of knotted motifs and their loca-tion (usually) in enzymatic active sites indicates that knots arecrucial for protein function. Over 1,300 knotted or slipknotted(shoelace-type) structures, including the trefoil (31), figure-eight(41), three-twist (52), and Stevedore’s (61) knots (4, 6, 7), havebeen deposited in the Protein Data Bank (PDB) to date accord-ing to KnotProt (8).

Mathematically, a knot is defined as an embedding of a cir-cle into a 3D space. A link is a generalization of a knot, definedas an embedding of a finite set of circles. The simplest examplesof links are, e.g., the Hopf link and the Solomon link (Fig. 1,Center). Links have been found in DNA (9, 10) and have beensynthesized in template synthesis (11, 12). In proteins, the firstattempts to identify links were made by Mislow (13, 14). In hisapproach, however, the link-forming loops were defined eitherby including interaction with a metal ion (noncovalent loop) orby at least two disulfide bonds for each (covalent) loop. The linksformed by covalent loops, each closed by one disulfide bridgeonly, were considered “unlikely to lead to knots or links” (ref.14, p. 4,202) by Mislow and therefore hardly examined. More-over, all of the structures were scanned only by “visual exami-nation of their 3D structures” (ref. 14, p. 4,202). To date, theonly known simple protein links are designed p53 protein cate-nanes (15) (with the backbones of both chains artificially closed,forming linked loops) and a thermophilic two-chain complex (16)(with linked loops formed by the backbones closed via disul-fide bridges). However, the discovery of a wide class of com-plex lasso proteins (17, 18), in which a chain pierces a cova-lently closed loop (Fig. 1), opens a unique possibility of definingand identifying links. Such links (or more formally, pretzelanes)are defined using covalent loops closed by disulfide bridges in asingle-protein chain (compare Fig. 1). Therefore, 20 years afterthe discovery of knotted proteins, it is time to reformulate Mans-field’s (1) question and ask, Are there links in proteins?

In this paper, we propose a general method to identify and clas-sify links in proteins and discuss their biological role. The exis-tence of proteins with stable links changes our view on the com-plexity of proteins and leads to many intriguing questions neveraskedbefore:Are linksconservedevolutionarily toprovideuniquefeatures of proteins? Do they exist in all kingdoms? How do theyfold? In this paper, we answer these questions, and, in addition,

we find relations between proteins with links based on comparingtheir evolutionary, sequential, and functional properties.

Search for LinksTo identify stable links in proteins, we analyzed their structureand used the method of spanning the (triangulated) minimalsurface (17, 18). A segment of a protein chain forms a cova-lent loop if the ends of the segment are connected by a cova-lent bond (e.g., disulfide bridges). Such a covalent loop can bepierced by a protein tail, thereby forming a complex lasso struc-ture (17) (Fig. 2). A link is formed when the piercing tail isitself a part of another covalent loop. To identify links and theirtypes, we analyze sequential numbers (indexes) of loop-formingcysteines and piercing residues (Fig. 2). The indexes of the cys-teines are known from the protein structure, whereas indexes ofloop-piercing residues can be determined from minimal surfaceanalysis used in the classification of lasso proteins (17, 18). Thismethod is general and can be applied to various intramolecularcontacts.

Using the above method, we performed a comprehensive sur-vey of the more than 115,000 chains deposited in the PDB as ofMay 2016, taking into account all known covalent interactions(e.g., cysteine, amide, ester, thioester, or carbon–carbon bonds).We found that links are formed in as many as 159 structures,of which 129 form the Hopf link and 35 form the Solomon link.The classification of these proteins according to their topologicalcomplexity, sequence similarity, and biological function is shownin Datasets S1 and S2, and exemplary linked protein are shownin Fig. 1. In what follows we discuss conclusions that follow fromthis review.

Conservation of Links and Artificial StructuresTo investigate the structural importance of links in proteins, weanalyzed their conservation in clusters of 30% sequential homol-ogy. We found that links are strictly conserved for all homologs(representative structures are presented in SI Appendix, Figs. S1–S3). The nonconservation of topology in a homology cluster canbe therefore viewed as a trace of a structure failure. Indeed,

Significance

Twenty years after a discovery of knotted proteins, we foundthat some single-protein chains can form links, which haveeven more complex structures than knots. We derive condi-tions that proteins need to meet to form links. We searchthrough the entire Protein Data Bank and identify severalchains that form a Hopf link and a Solomon link. The linkmotif has not been recognized before; however, it is clearlyof important functional significance in proteins. In this article,we relate topological properties of proteins with links to theirfunction and stability and show that the link topology is char-acteristic of eukaryotes only.

Author contributions: P.D.-T. and J.I.S. designed research; P.D.-T. performed research;P.D.-T. and J.I.S. analyzed data; and P.D.-T. and J.I.S. wrote the paper.

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

Freely available online through the PNAS open access option.

1To whom correspondence should be addressed. Email: [email protected].

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1615862114/-/DCSupplemental.

www.pnas.org/cgi/doi/10.1073/pnas.1615862114 PNAS | March 28, 2017 | vol. 114 | no. 13 | 3415–3420

Dow

nloa

ded

by g

uest

on

June

5, 2

020

Page 2: Topological knots and links in proteins - PNAS · Topological knots and links in proteins Pawel Dabrowski-Tumanskia,b and Joanna I. Sulkowskaa,b,1 aFaculty of Chemistry, University

Fig. 1. Exemplary structures of proteins with links: negative Hopf link, pos-itive Hopf link, and positive Solomon link [Protein Data Bank (PDB) codes2LFK, 2KQA, and 4ASL, Top to Bottom]. Middle row shows the most generallink type (without orientation). The orange stripes denote disulfide bridges.The N and C letters denote the N and C termini of the protein. The arrowsdenote the orientation from the N to the C terminus. For details in link ori-entation see SI Appendix. In each panel, colors in the structure match thecolors in the scheme at Right; the protein topology is presented as a solid(black or colored) line.

proteins with nonconserved topologies possess a large gap in thestructure (nine cases; SI Appendix, section S9), or linked loopsare probably an artifact of the EM experiment (highly mobileloops in Envelope glycoprotein gp120, PDB code 3J70). Thosestructures were excluded from further analysis. On the otherhand, the nonconservation of topology can stem from humanlyintroduced mutations, as for example in the glutamate receptor 2(PDB code 3T93), in which two loop-closing cysteines were intro-duced, which results in the formation of a Hopf link (SI Appendix,Fig. S6). This is the only nonnatural example of a protein with alink seen in our survey.

Proteins with the Hopf LinkMost of the protein structures with links form Hopf links.After sequential clustering, we found 14 representative Hopf-link structures, presented in Table 1 with their structural details.The linear orientation of a protein chain introduces a chiralityto the Hopf link, resulting in two topological types denoted ±,which differ by the piercing direction defined in refs. 17 and 18(Fig. 1, Top row). This sign is also indicated in Table 1 (column“Orientation”). Moreover, in most cases proteins possess morethan two covalent loops. The full set of representative proteinstructures with schemes of their disulfide bond arrangements ispresented in SI Appendix, section S8. In particular, in two cases(PDB codes 1HD5 and 1WC2) one of the loops forming a Hopflink is contained in the larger one; in these cases, details for thelarger covalent loop are given in Table 1.

As can be seen from Table 1, proteins with a Hopf link varyin size from 57 aa to 820 aa. The size of their loops also variesgreatly (17–387 aa). However, in most cases the covalent loop hasless than 100 residues. Strikingly, in most cases the covalent loops

Fig. 2. The method of identification of links. Oneach covalent loop a triangulated minimal surfaceis spanned (Left and Right). The necessary condi-tion for the existence of a link is that surfaces pierceone another (Center) or, equivalently, each surfaceis pierced by the border of the second surface.

are separated by only a few residues (in 9 of 14 cases the loopsare separated by 2 or fewer residues). Thus, a large loop size andlarge separation between them may imply that in such structuresthe link topology is not functional and is an accidental effect ofthe disulfide loop and chain arrangement (e.g., structures withPDB codes 1H30 and 4B56). Furthermore, closed loops seemto be inequivalent—for proteins with a positive Hopf link, thesequentially latter loop is always larger, whereas the sequentiallyfirst loop is usually (but not always) larger in structures with anegative Hopf link. There are similar numbers of positive (eight)and negative (six) Hopf links, indicating that there is no obviouspreference for any chirality.

Note that a separation between covalent loops is, in mostcases, the size of the average persistence length of a proteinchain. This leads to a hypothesis that a small separation is actu-ally the reason, not the effect of a link topology. In fact, in thePDB database there are 104 representative structures with cys-teine loops separated by exactly two residues; thus, almost 7% ofthem are linked. A small separation between the covalent loopscan influence the structure in two ways. First, because of the per-sistence length and steric effects, cysteines located in the nearestvicinity are inhibited from the formation of nonnative disulfidebridges (at least as long as they are separated by an even numberof residues). This effect reduces the possibility of the formationof nonnative disulfide bridges. Second, the persistence lengthlocally forces the chain arrangement, creating the possibility tocontrol the cysteine mutual exposition direction and facilitatingthe formation of the correct (link) topology.

Despite a low sequence similarity, structures with the positiveHopf link seem to be collectively more similar to one another,as follows, e.g., from structural data in Table 1. To formalize thisobservation, we calculated the sequence identity and structuralP value for each pair of structures (SI Appendix, Table S2) andfound that the average sequence identity for proteins with thepositive Hopf link (13%) is higher than for the negative Hopflink (4.5%). Similarly, the respective P value is lower (3.43E-02and 8.76E-0, respectively, for positive and negative structures).This result suggests that proteins with the positive Hopf link aredistantly related, in contrast to those with the negative one.

Function and Origin of Proteins with a Hopf Link. The next intrigu-ing question is why proteins conserve a complicated link topol-ogy despite naturally expected problems during folding. Onepossible answer is a functional importance for links. The func-tion and origin of proteins with the Hopf link are shown in Table2. The mutated glutamate receptor 2, as an unnatural protein, isexcluded from this analysis.

As follows from Table 2, linked proteins fulfill differentfunctions. However, all of the structures with the positive Hopf-link topology are classified into the same Barwin-like endoglu-canases superfamily in the CATH (class, architecture, topol-ogy/fold homologous superfamily) and SCOP (structural classifi-cation of proteins) databases (19, 20) (3X2G was absent in bothdatabases) and the Double Psi beta barrel glucanase (DPBB)in the PFAM (protein families) database (21) (1WC2 was notclassified in PFAM). Taking into account different kingdoms oforigin (animals, plants, fungi), this observation indicates that allproteins with the positive Hopf link originate from one ances-tral, early eukaryotic, sugar-binding protein. Indeed, the affin-ity to sugars is conserved, however not evident, in the positive

3416 | www.pnas.org/cgi/doi/10.1073/pnas.1615862114 Dabrowski-Tumanski and Sulkowska

Dow

nloa

ded

by g

uest

on

June

5, 2

020

Page 3: Topological knots and links in proteins - PNAS · Topological knots and links in proteins Pawel Dabrowski-Tumanskia,b and Joanna I. Sulkowskaa,b,1 aFaculty of Chemistry, University

BIO

PHYS

ICS

AN

DCO

MPU

TATI

ON

AL

BIO

LOG

YA

PPLI

EDM

ATH

EMA

TICS

Table 1. Structural data for representative protein chains with the Hopf link, along with proteins’ function

PDB code L1 P1 L2 P2 No. hom. No. loops Size Loop sep. Orientation

1BW3A 22 +82 57 +49 4 3 125 2 Positive2KQAA 23 +75 55 +45 1 2 129 2 Positive2HCZX 28 +94 67 +58 2 3 245 2 Positive3X2GA 28 +84 85 +36 11 5 180 2 Positive3SUKA 37 +97 59 +64 4 2 125 2 Positive3SUMA 37 +97 62 +72 1 3 136 2 Positive1WC2A 39 +100 85 +51 1 6 181 2 Positive1HD5A 70 +109 113 +74 7 7 213 0 Positive

2LFKA 27 −57 17 −45 4 4 57 0 Negative2E26A 39 −2473 166 −2380 2 12 725 5 Negative3R4DB 137 −240 79 −126 2 3 288 6 Negative4B56A 212 −529 387 −230 25 16 820 64 Negative1H30A 287 −668 27 −477 3 4 422 72 Negative

3T93B 77 −219 55 −109 243 2 258 65 Negative

Ln denotes the size of the loop n, Pn is the signed index of a residue piercing through loop n. Loop sep. is the sequentialdistance between the loops. No. hom. is the number of homologs for given structure, No. loops is the number of disulfide-based covalent loops in the structure (e.g., if four, there are two covalent loops forming a Hopf link and two trivial covalentloops). The first open row separates proteins with different orientation of the Hopf link. The second open row separatesthe humanly modified protein with PDB code 3T93. Proteins are ordered according to the size of the first pierced loop.

Hopf-link proteins. For example, sugar-composed cellulose is atarget for endoglucanases, whereas cerato-platanins and barwindomains are known to be sugar binders. The beta-expansin (PDBcode 2HCZ) is a fertilization factor, selective for cell walls richin glucuronoarabinoxylans and β-D-glucans. On the other hand,most of the negative Hopf-link structures are not classified in thecommon databases, and their functions are more varied. Never-theless, it is worth noting that also in this case four of five suchstructures are animal proteins.

A hallmark of linked proteins is that all of them are secreted ortransmembrane proteins. Disulfide bridges are known to intro-duce additional stability and in this case the stability is most evi-dent. Cerato-platanins have been shown to be stable up to 76 ◦C(22). Fungal endoglucanase (1HD5) is stable after heating to60 ◦C and with pH between 3.0 and 11.0 (23), retaining 45% ofits activity after incubating for 5 min at 95 ◦C (23). The animalendoglucanase (PDB code 1WC2) withstands heating for 10 minat 100 ◦C without irreversible loss of activity (24). Possibly the rea-son for the exceptional stability lies in the topology of the proteins.In the case of proteins with the Hopf link, covalent loops cannotbe separated, even at high temperatures, without breaking disul-fide bridges. Moreover, such bridges are often buried deep insidethe protein, additionally stabilizing the structure (Fig. 3).

Table 2. Function, orientation, molecule type, kingdom of origin organism, and cellular location for repre-sentative proteins with the Hopf link topology

PDB code Function Molecule Kingdom Cellular location Orientation

1BW3A Lectin Barwin Plants Secreted Positive2HCZX Allergen Beta-expansin 1a Plants Secreted Positive3SUKA Toxin Cerato-platanin Fungi Secreted Positive3SUMA Toxin Cerato-platanin Fungi Secreted Positive2KQAA Toxin Cerato-platanin Fungi Secreted Positive3X2GA Hydrolase Endoglucanase Fungi Secreted Positive1HD5A Hydrolase Endoglucanase Fungi Secreted Positive1WC2A Hydrolase Endoglucanase Animals Secreted Positive4B56A Hydrolase Pyrophosphatase Animals Transmembrane Negative2LFKA Inhibitor Tryptase inhibitor Animals Secreted Negative2E26A Signaling Reelin Animals Secreted Negative1H30A Laminin Growth-arrest specific Animals Secreted Negative3R4DB Viral protein Spike glycoprotein Viruses Transmembrane Negative

To analyze the influence of links, we determined the energybarrier on unfolding in five models of 2LFK, the smallest pro-tein with the Hopf link, differing only in their topology. The firstmodel is the native structure with the Hopf link. In the secondmodel the covalent loops are unlinked; the model is obtainedby interchanging disulfide bonds (Cys-24 is paired with Cys-52instead of Cys-51) without destroying surrounding contacts. Twoadditional models involve only one covalent loop (respectivelyred or blue in Fig. 4; i.e., with only one disulfide bridge), and thelast model involves no closed loops (no disulfide bridge). Duringsimulations the disulfide bridges are not allowed to break (whichmodels oxidative conditions). As an unfolding measure we definethe time needed for the structure to achieve 10 A of rmsd fromthe native structure. To focus on topological properties and togather good statistics, we used SBM model. For each model,based on an unfolding rate constant calculated at seven differenttemperatures, we determined an approximate unfolding energybarrier (Fig. 4). As expected, the model with two native closedloops has an energy barrier an order of magnitude higher thanmodels with one or no closed loops; the presence of two loopsstabilizes the structure to a much higher degree than a singleclosed loop. However, the Hopf-link structure has an energy bar-rier over 20% higher than the topologically trivial structure. This

Dabrowski-Tumanski and Sulkowska PNAS | March 28, 2017 | vol. 114 | no. 13 | 3417

Dow

nloa

ded

by g

uest

on

June

5, 2

020

Page 4: Topological knots and links in proteins - PNAS · Topological knots and links in proteins Pawel Dabrowski-Tumanskia,b and Joanna I. Sulkowskaa,b,1 aFaculty of Chemistry, University

Fig. 3. Crystal structure of cerato-platanin (PDB code 3SUK). (A) Covalentloops forming the Hopf link are depicted in red and blue. Cysteines closingthe loops and cysteine bridges are marked in orange. (B) Solvent-exposedsurface. Colors correspond to the crystal structure. Only one cysteine bridge(marked with a black circle) is partially exposed to the solvent.

purely topological effect may explain the biological advantage oflinks in proteins.

Folding of Proteins with the Hopf Link. Proteins with the Hopf-link structure may fold via three different pathways (Fig. 5).In general, the probability of each pathway depends on con-ditions. Oxidative conditions facilitate the formation of disul-fide bonds and, therefore, of covalent loops (Upper and Lowerpathways in Fig. 5). Once the covalent loop is formed, thread-ing is required to complete folding the chain. In general, thechain threading was shown to be a major topological bar-rier in the free-energy landscape (at least for knotted pro-teins) (25, 26). To investigate this issue, we performed aseries of coarse-grained simulations for the smallest proteinwith the Hopf link (tick-derived protease inhibitor TdPI withPDB code 2LFK). This protein has two covalent loops of length17 residues and 27 residues. We performed folding and unfoldingsimulations in two different models. In each model, one loop-forming bridge was made persistent. It turned out that if thelarger (red in Fig. 5) covalent loop is formed, the protein canboth fold and unfold. The loop threading occurs via bendingthe chain in the vicinity of the bridge, which resembles the slip-knotting mechanism known for knotted proteins (26–28) (Fig.5, Lower pathway). Such a mechanism is possible because thepiercing residue is located in the nearest vicinity of the bridge. Inthe case of the smaller loop (blue in Fig. 5), in which the pierc-ing residue is located farther in the sequence, neither thread-

Fig. 4. The temperature dependence of unfolding rate constant for fivemodels of TdPI protein, differing in topology. Inset shows the protein struc-ture with native “red” and “blue” loop indicated. The stripes denote theloop closing pattern: orange and green for the (native) Hopf link and thetrivial (nonnative) model. The fitted function is presented in the top rightcorner. On the right side the fitted values of EA/R are given. In the bot-tom right corner, the schemes of the Hopf link and the trivial model arepresented.

ing nor unthreading occurs. As a result, if the smaller cova-lent loop is formed, the protein can neither fold nor completelyunfold. To approximate the probability for each pathway, weconducted a series of CG folding simulations without any biasfrom disulfide bonds and measured how often the pair Cys-24–Cys-51 (forming the larger, red covalent loop) will be located inthe native distance before the Cys-52–Cys-69 pair (forming thesmaller, blue covalent loop). It turned out that the Cys-52–Cys-69bond (smaller loop) is created approximately three times moreoften than the Cys-24–Cys-51 bond (larger loop), independentof the temperature (SI Appendix, Table S3), definitely hinderingfolding. There is, however, the third pathway in which first the“interior” of the link is “twisted” properly and the loops are closedin the last step, which can be a way to overcome the problem ofloop threading (such a mechanism is impossible in the case ofknotted proteins). In the case of the TdPI protein, this mechanismis facilitated by the formation of a β strand and, in fact, it shouldbe the most common pathway (SI Appendix, Table S3). Neverthe-less, even in this pathway, the formation of native disulfide bridgescan lead to a nonnative, trivial topology (topological trap). In thecase of the TdPI protein such a misfolded structure can be verysimilar to the native one (compared, e.g., by the rmsd value), butcovalent loops should be definitely more labile. In fact, the oxida-tive and reductive folding of this protein was studied previously(29). In particular, it was shown that in oxidative conditions theprotein rarely achieves its native state, forming most often a non-native structure with three (of four) cysteine bonds. The missingbond is Cys-38–Cys-58 joining the loops, which were very labile.The concentration of the misfolded product was reduced upon theaddition of reducing agent (glutathione), which enabled the bondreshuffling. The bonds formed in the nonnative product can benonnative as well. Thus, we used the simple CG model enrichedwith the Cys–Cys interaction, allowing for the creation of bothnative and nonnative pairs. In the case when the Cys–Cys bondswere allowed to break during the simulation (reflecting reduc-ing conditions), ∼70% of the folding simulations finished in thenative state, independent of temperature. If, however, the bondswere not allowed to break (oxidative conditions), ∼70% of thefolding simulations finished in the nonnative, topologically trivialstructure (SI Appendix, Table S4) with rmsd lower than 5 A fromthe native structure. Moreover, this structure is not an artifactof the CG model, because all atom representation can be recon-structed from the CA trace [using the Modeller software (30)] (SIAppendix, Fig. S1). Therefore, it is highly probable that the foldingtrap observed in ref. 29 was the topologically trivial structure.

Other LinksAlthough the majority of links found are Hopf links, we alsodetected other topologies, which are presented in Dataset S2.

Fig. 5. Possible ways of folding of TdPI. Folding can follow three differ-ent pathways, but formation of the small covalent loop as the first eventblocks folding. Moreover, in the last folding step the protein can collapse toa topological trap (in red oval), characterized by trivial topology. Green ovaldenotes the native, Hopf-link structure.

3418 | www.pnas.org/cgi/doi/10.1073/pnas.1615862114 Dabrowski-Tumanski and Sulkowska

Dow

nloa

ded

by g

uest

on

June

5, 2

020

Page 5: Topological knots and links in proteins - PNAS · Topological knots and links in proteins Pawel Dabrowski-Tumanskia,b and Joanna I. Sulkowskaa,b,1 aFaculty of Chemistry, University

BIO

PHYS

ICS

AN

DCO

MPU

TATI

ON

AL

BIO

LOG

YA

PPLI

EDM

ATH

EMA

TICS

Table 3. Structural data for representative protein chains withthe Solomon link, along with proteins’ function

PDB No. No. Loopcode L1 P1 L2 P2 hom. loops Size sep.

2XJPA 146−218

87−142

19 4 258 0−252 −165

4ASLA 129−221

82−147

15 3 229 0−253 −170

Ln denotes the size of loop n, Pn is the signed index of residue piercingthrough the loop n. Loop sep. is the sequential distance between the loops.No. hom. is the number of homologs for given structure, and No. loops isthe number of disulfide-based covalent loops in the structure (e.g., if four,there are two covalent loops forming a Solomon link and two trivial cova-lent loops). Size is the total number of residues in the structure.

After clustering homological sequences, we extracted two rep-resentative chains (Table 3) that form a Solomon link (Fig. 6).Both of them are fungal adhesive proteins, sharing only 21%of sequence identity. Nevertheless, they represent two closelyrelated families of flocculation (2XJP) and epithelial adhesion(4ASL) proteins (31) and have highly similar structure, with themutual structural P value of 3.96e-14. This result shows thatthe structure and the topology are much more conserved thanthe sequence. Possibly, aside from stability (both proteins aresecreted), this is the second manifestation of the role of topol-ogy: The presence of a link holds the chain together, allowingfor only minor changes in structure, despite even a large numberof point mutations. As the topology of the Solomon link cannotbe detected based on piercings only, we conducted a simulationof thermal unfolding, extracting only the link-forming disulfideloops. As a result, the loops moved apart, which enables one toidentify the topology simply “by eye” (Fig. 6).

It is interesting to note that proteins with the Solomon linkcreate cell assemblies by recognizing and binding the carbohy-drates, similar to proteins with the positive Hopf link. However,although Solomon-link proteins are approximately twice as largeas Hopf-link proteins, a sequential and structural comparisonof the domains constituting the Solomon-link and the positiveHopf-link proteins does not show any resemblance or clear signof possible gene duplication.

Brunnian LinksThe procedure for link identification described above is gen-eral; however, it cannot identify so-called Brunnian links, i.e.,links that cannot be decomposed, but become trivial (i.e., froma set of unlinked circles) upon removing any component. Thesimplest example of such a structure are the Borromean rings(Fig. 6, Bottom Left). The overall complexity of such a link stemsfrom the fact that each component is pierced by another com-ponent at least twice, so that an arc is formed between pierc-ings. Through this arc, a part of the next component has to bethreaded, as in Fig. 6, Bottom Right. In the case of the Borromeanrings, all three components are mutually entangled in this man-ner. Such configurations also can be formed in complex lassoproteins. To identify them, one has to consider every covalentloop that is pierced more than once [e.g., twice in proteins witha double lasso, denoted L2 (17)] and identify piercings througharcs formed by such a structure (Fig. 6, Bottom Right). With thisapproach, we scanned the PDB database and identified such acomplex structure in the goat lactoperoxidase (e.g., PDB code2E9E) shown in Fig. 6, Middle. Here one ring is closed by thecysteine bridge Cys-6–Cys-167. The surface spanning that ring ispierced three times by the chain (triple-lasso topology L3 type),via residues Gln-179, Met-352, and Ile-436 (Fig. 1, Bottom). Thesurface spanning the arc closed by Gln-179 and Met-352 is sub-sequently pierced by the next portion of the chain, via Asp-525(Fig. 6, Middle). In the Borromean rings, however, this complexarrangement of the chains has to occur three times with three

mutually piercing components. In the goat lactoperoxidase, thiscondition is fulfilled only once, and this protein (along with itshomologs up to 30%) is the only protein structure with such anarrangement. We conclude that no protein Brunnian link existsin proteins deposited in the PDB.

Nevertheless, the influence of the complex arrangement of thechain on the goat lactoperoxidase’s properties is intriguing. Wecan, in fact, distinguish two levels of complexity. One level isthe piercing through the surface spanned on the covalent loop,with the formation of an arc. The second level is the piercing ofthe surface spanned on this particular arc. Interestingly, it wasshown that goat lactoperoxidase is highly stable in a wide rangeof pH (4–11) and it unfolds in a two-step process (32). In the firststep the peripheral fragment is being unfolded, leaving the coreintact. This requires unthreading of the piece of the chain pierc-ing the surface spanned on the arc (green part in Fig. 6, Bottom).In the second step, the core is unfolded, which requires unthread-ing of the surface spanned by the covalent loop (unthreading ofred piece of the chain through the blue surface). Possibly, thearc piercing fulfills an analogous role in biology to that in math-ematics. In the Borromean rings it prevents the overall unlinkedrings from falling apart. Perhaps the evolution “used” the deli-cate topological properties to invent another mechanism for thestabilization of protein structures long before humans discoveredBrunnian links.

DiscussionIt is known that links are crucial in many biological processes,e.g., in DNA replication, and their proper description requirestools from knot theory. The discovery of knots, slipknots, andlassos in proteins already has changed our view on proteins’ com-plexity. In this paper, we showed that this complexity is evenmore involved, and apart from knots, biology also designed stablelinks within single-protein chains. We described the structure ofthose links, as well as their possible evolution and biological sig-nificance. Topological links are formed by covalent, disulfide-based loops, and we identified three types of links in proteins:positive and negative Hopf links and the Solomon link. The

Fig. 6. Exemplary structures of proteins with the Solomon link and thecore of Borromean rings. (Top, Left to Right) An exemplary protein withthe Solomon link (PDB code 4ASL), the covalent loops after unfolding, andthe scheme of the corresponding Solomon link. (Middle, Left to Right) Struc-ture of goat lactoperoxidase (PDB code 2E9E) forming the core of Bor-romean rings; the structure after smoothing—the blue surface is spannedon the main chain and on the disulfide bridge, and the red surface isspanned by the chain and is delimited by the blue surface (shown in cyan)and it is pierced by the green part of the chain; the schematic view of theprotein—the solid color lines form the core of Borromean rings, shown inBottom, Left to Right with retention of colors. Bottom Left presents theBorromean rings.

Dabrowski-Tumanski and Sulkowska PNAS | March 28, 2017 | vol. 114 | no. 13 | 3419

Dow

nloa

ded

by g

uest

on

June

5, 2

020

Page 6: Topological knots and links in proteins - PNAS · Topological knots and links in proteins Pawel Dabrowski-Tumanskia,b and Joanna I. Sulkowskaa,b,1 aFaculty of Chemistry, University

link topology induces additional stabilization and this effect isindependent of the stabilization introduced by disulfide bridgesalone. This is especially important, because all proteins with linksoperate outside of the cell. Furthermore, the topology-inducedstability provides an additional powerful tool for biotechnologistsin designing new, extra-stable peptides, as suggested before formolecular knots (33).

Despite low sequential similarity, all proteins with the posi-tive Hopf link are carbohydrate-binding proteins, indicating thatthey have a common ancestor. The first positive Hopf-link pro-teins must have occurred in the first eukaryotic cells, before split-ting the tree of life into animals, plants, and fungi, as the sametopological motif can be found in all those kingdoms of life.This, on the other hand, indicates that topological constraintscan allow great sequential variability. In fact, the only conservedfragment is located in the core of the link region, i.e., in the vicin-ity of cysteines closing the first and opening the second covalentloop (found by Clustal O; SI Appendix, Fig. S10). This, in turn,explains why there is no sign of gene duplication in proteins withSolomon links, although in principle the Solomon link could beobtained by a duplication of some parts of the Hopf-link struc-ture and despite a similar propensity of Solomon-link proteins tocarbohydrates.

The folding of proteins with links is another interesting conun-drum. In this work, we have taken a step in explaining possiblepathways of folding of the smallest Hopf-link protein TdPI. Nev-ertheless, it is still puzzling that in both of our simulations andin experiments, in oxidizing conditions (in which this secretedprotein should fold) the TdPI folds toward a nonnative state,which is supposed to be a topologically trivial structure. How-ever, our results can also support theories of knotted proteinfolding in which the threading of a chain through a twisted loopis known to be the rate-limiting step of folding. In particular, weshowed that the loop consisting of 17 residues is probably toosmall to allow threading at all. On the other hand, the loop con-sisting of 27 residues is large enough to allow threading to occurwith the slipknot mechanism, which was proposed earlier to solvethe threading problem in the case of knotted proteins (26). This

result shows that analyzing the folding of proteins with links orcomplex lassos can provide important clues about the foldingprocesses of entangled proteins.

Another issue discussed in this article is the possibility of theformation of Brunnian links in proteins. Although we have notidentified any such link in the PDB, we found an interestingcase of goat lactoperoxidase, which fulfills one of the geometricrequirements of the Brunnian link. It seems also that geometry ofthis protein is correlated with its two-step unfolding process. Thisshows, in general, how influential the topology can be and thatstudying topological effects in proteins may provide much insightabout correlations between structure, function, and properties.

Let us conclude with a remark that similarly to different types ofentanglement being used, e.g., in material science, the same con-cept can be used in biology to equip proteins with special proper-ties.Ourmethodsdescribed in thisworkcanbeusedto identifyandanalyze properties of linked protein chains and even more com-plex structures, e.g., macromolecular links in virus capsids, whosestructures will start to be available based on the EM method.

Proteins with links are collected in the LinkProt database (34).

Materials and MethodsProtein Dataset. In this work, we analyzed all protein structures deposited inthe PDB database as of May 2016. Gaps were modeled by straight-line seg-ments; however, no additional crossings were introduced. Sequential clus-tering was done using BLAST. The structural (P value) and sequential homol-ogy were calculated using the jFATCAT algorithm.

Protein Dynamics. The dynamics of the protein were conducted using Gro-macs with SMOG software (35); details are described in SI Appendix. Thesurface was designated as in refs. 17 and 18.

ACKNOWLEDGMENTS. This work was supported by European Molecu-lar Biology Organization Installation Grant 2057 and Polish Ministry forScience and Higher Education Grant 0003/ID3/2016/64 (to J.I.S.), Univer-sity of Warsaw Young Researcher Fellowship Grant 120000-501/86-DSM-112 700 (to P.D.-T.), and Polish National Science Centre Preludium Grant2016/21/N/NZ1/02848 (to P.D.-T.).

1. Mansfield ML (1994) Are there knots in proteins? Nat Struc Biol 1(4):213–214.2. Mansfield ML (1997) Fit to be tied. Nat Struc Biol 4(3):166–167.3. Takusagawa F, Kamitori S (1996) A real knot in protein. J Am Chem Soc 118(37):8945–

8946.4. Taylor WR (2000) A deeply knotted protein structure and how it might fold. Nature

406(6798):916–919.5. Sułkowska JI, Rawdon EJ, Millett KC, Onuchic JN, Stasiak A (2012) Conservation

of complex knotting and slipknotting patterns in proteins. Proc Natl Acad Sci USA109(26):E1715–E1723.

6. Blinger D, et al. (2010) A Stevedore’s protein knot. PLoS Comput Biol 6(4):e1000731.7. King NP, Yeates EO, Yeates TO (2007) Identification of rare slipknots in pro-

teins and their implications for stability and folding. J Mol Biol 373(1):153–166.

8. Jamroz M, et al. (2014) KnotProt: A database of proteins with knots and slipknots.Nucleic Acids Res 43(D1):D306–D314.

9. Sundin O, Varshavsky A (1980) Terminal stages of SV40 DNA replication proceed viamultiply intertwined catenated dimers. Cell 21(1):103–114.

10. Chen J, Rauch CA, White JH, Englund PT, Cozzarelli NR (1995) The topology of thekinetoplast DNA network. Cell 80(1):61–69.

11. Reuter C, Mohry A, Sobanski A, Vogtle F (2000) [1]rotaxanes and pretzelanes: Synthe-sis, chirality, and absolute configuration. Chemistry 6(9):1674–1682.

12. Liu Y, et al. (2005) Dynamic chirality in donor-acceptor pretzelanes. J Org Chem70(23):9334–9344.

13. Liang C, Mislow K (1994) Knots in proteins. J Am Chem Soc 116(24):11189–11190.14. Liang C, Mislow K (1995) Topological features of protein structures: Knots and links.

J Am Chem Soc 117(15):4201–4213.15. Yan LZ, Dawson PE (2001) Design and synthesis of a protein catenane. Angew Chem

Int Ed Engl 40(19):3625–3627.16. Boutz DR, et al. (2007) Discovery of a thermophilic protein complex stabilized by

topologically interlinked chains. J Mol Biol 368(5):1332–1344.17. Niemyska W, et al. (2016) Complex lasso: New entangled motifs in proteins. Sci Rep

6:36895.18. Dabrowski-Tumanski P, Niemyska W, Pasznik P, Sulkowska JI (2016) Lasso-

Prot: Server analyzing biopolymers with lassos. Nucleic Acids Res 44(W1):W383–W389.

19. Pearl F, et al. (2005) The CATH domain structure database and related resourcesGene3D. Nucleic Acids Res 33(Suppl 1):D247–D251.

20. Fox NK, Brenner SE, Chandonia JM (2014) SCOPe: Structural classification ofproteins—extended, integrating SCOP and ASTRAL data and classification of newstructures. Nucleic Acids Res 42(D1):D304–D309.

21. Bateman A, et al. (2004) The Pfam protein families database. Nucleic Acids Res32:D138–D141.

22. de Oliveira AL, et al. (2011) The structure of the elicitor cerato-platanin (CP), thefirst member of the CP fungal protein family, reveals a double ψβ-barrel fold andcarbohydrate binding J Biol Chem 286(20):17560–17568.

23. Hayashida S, Ohta K, Mo K (1988) Cellulases of Humicola insolens and Humicolagrisea. Methods Enzymol 160:323–332.

24. Xu B, Hellman U, Ersson B, Janson JC (2000) Purification, characterization and amino-acid sequence analysis of a thermostable, low molecular mass endo-β-1, 4-glucanasefrom blue mussel, Mytilus edulis. Eur J Biochem 267(16):4970–4977.

25. Sułkowska JI, Noel JK, Onuchic JN (2012) Energy landscape of knotted protein folding.Proc Natl Acad Sci USA 109(44):17783–17788.

26. Sulkowska JI, Sułkowski P, Onuchic J (2009) Dodging the crisis of folding proteins withknots. Proc Natl Acad Sci USA 106(9):3119–3124.

27. Noel JK, Sułkowska JI, Onuchic JN (2010) Slipknotting upon native-like loop formationin a trefoil knot protein. Proc Natl Acad Sci USA 107(35):15403–15408.

28. Noel JK, Onuchic JN, Sulkowska JI (2013) Knotting a protein in explicit solvent. J PhysChem Lett 4(21):3570–3573.

29. Bronsoms S, et al. (2011) Oxidative folding and structural analyses of a kunitz-related inhibitor and its disulfide intermediates: Functional implications. J Mol Biol414(3):427–441.

30. Webb B, Sali A (2014) Comparative protein structure modeling using Modeller. CurrProtoc Protein Sci 86:2.9.1–2.9.37.

31. Ielasi FS, Decanniere K, Willaert RG (2012) The epithelial adhesin 1 (Epa1p) from thehuman-pathogenic yeast Candida glabrata: Structural and functional study of thecarbohydrate-binding domain. Acta Crystallogr D Biol Crystallogr 68(3):210–217.

32. Boscolo B, Leal SS, Salgueiro CA, Ghibaudi EM, Gomes CM (2009) The prominentconformational plasticity of lactoperoxidase: A chemical and pH stability analysis.Biochim Biophys Acta 1794(7):1041–1048.

33. Siegel JS (2012) Driving the formation of molecular knots. Science 338:752–753.34. Dabrowski-Tumanski P, et al. (2016) LinkProt: A database collecting information

about biological links. Nucleic Acids Res 45(D1):D243–D249.35. Noel JK, et al. (2016) SMOG 2: A versatile software package for generating structure-

based models. PLoS Comput Biol 12(3):e1004794.

3420 | www.pnas.org/cgi/doi/10.1073/pnas.1615862114 Dabrowski-Tumanski and Sulkowska

Dow

nloa

ded

by g

uest

on

June

5, 2

020