19
A Kinematic View of Loop Closure EVANGELOS A. COUTSIAS, 1, * CHAOK SEOK, 2, * MATTHEW P. JACOBSON, 2 KEN A. DILL 2 1 Department of Mathematics and Statistics, University of New Mexico, Albuquerque, New Mexico 87131 2 Department of Pharmaceutical Chemistry, University of California in San Francisco, San Francisco, California 94143-2240 Received 26 August 2003; Accepted 4 November 2003 Abstract: We consider the problem of loop closure, i.e., of finding the ensemble of possible backbone structures of a chain segment of a protein molecule that is geometrically consistent with preceding and following parts of the chain whose structures are given. We reduce this problem of determining the loop conformations of six torsions to finding the real roots of a 16th degree polynomial in one variable, based on the robotics literature on the kinematics of the equivalent rotator linkage in the most general case of oblique rotators. We provide a simple intuitive view and derivation of the polynomial for the case in which each of the three pair of torsional axes has a common point. Our method generalizes previous work on analytical loop closure in that the torsion angles need not be consecutive, and any rigid intervening segments are allowed between the free torsions. Our approach also allows for a small degree of flexibility in the bond angles and the peptide torsion angles; this substantially enlarges the space of solvable configurations as is demonstrated by an application of the method to the modeling of cyclic pentapeptides. We give further applications to two important problems. First, we show that this analytical loop closure algorithm can be efficiently combined with an existing loop-construction algorithm to sample loops longer than three residues. Second, we show that Monte Carlo minimization is made severalfold more efficient by employing the local moves generated by the loop closure algorithm, when applied to the global minimization of an eight-residue loop. Our loop closure algorithm is freely available at http://dillgroup. ucsf.edu/loop_closure/. © 2004 Wiley Periodicals, Inc. J Comput Chem 25: 510 –528, 2004 Key words: loop closure; kinematic; loop modeling; Monte Carlo minimization Introduction We consider the problem of loop closure, i.e., finding structures of a segment in a chain molecule that are geometrically consistent with the rest of the chain structure. This problem has an important application in homology modeling, 1 when segments of insertions or deletions are to be modeled while the rest of the protein structure is relatively well known from structures of homologous proteins. Another useful application is in the area of Monte Carlo simulations, where alternative segment structures can be intro- duced as elementary localized moves. 2–11 These moves can lead to improved efficiency in conformational sampling. Unlike Cartesian moves, they avoid geometric distortions and the high energy penalty these entail. On the other hand, the deformation produced by these moves is limited to a segment, while uncoordinated torsion angle moves result in movement proportional to the dis- tance of each atom from the perturbation axes, resulting in large uncontrolled moves. Other possible applications of the loop clo- sure problem are discussed in ref. 12. It is well known that the number of constraints is identical to the number of degrees of freedom (DOFs) in the case of loops with six free torsion angles, or three residue loops for proteins. 13 This means that, in general, such loops may be found as discrete solutions of the loop closure problem. This fact has been known for some time in the Kinematic theory of Mechanisms. 14 Kine- matics is the branch of mechanics whose concern is the geometric analysis of motion, especially constrained displacements without regard to forces. The kinematic analysis of systems of rigid objects connected by flexible joints, such as multijointed robotic manipu- lators, exhibits many similarities with the geometric analysis of macromolecules, when the forces responsible for the motions are ignored and the main question of interest is the analysis of possible conformations consistent with the constraints associated with bond lengths and bond angles. In robotics, joints that allow one arm to *These 2 authors contributed equally to this study. Correspondence to: C. Seok; e-mail: [email protected] © 2004 Wiley Periodicals, Inc.

A Kinematic View of Loop Closure - University of New Mexicovageli/papers/loop.pdf · A Kinematic View of Loop Closure EVANGELOS A. COUTSIAS,1,* CHAOK SEOK,2,* MATTHEW P. JACOBSON,2

Embed Size (px)

Citation preview

Page 1: A Kinematic View of Loop Closure - University of New Mexicovageli/papers/loop.pdf · A Kinematic View of Loop Closure EVANGELOS A. COUTSIAS,1,* CHAOK SEOK,2,* MATTHEW P. JACOBSON,2

A Kinematic View of Loop Closure

EVANGELOS A. COUTSIAS,1,* CHAOK SEOK,2,* MATTHEW P. JACOBSON,2 KEN A. DILL2

1Department of Mathematics and Statistics, University of New Mexico,Albuquerque, New Mexico 87131

2Department of Pharmaceutical Chemistry, University of California in San Francisco,San Francisco, California 94143-2240

Received 26 August 2003; Accepted 4 November 2003

Abstract: We consider the problem of loop closure, i.e., of finding the ensemble of possible backbone structures ofa chain segment of a protein molecule that is geometrically consistent with preceding and following parts of the chainwhose structures are given. We reduce this problem of determining the loop conformations of six torsions to finding thereal roots of a 16th degree polynomial in one variable, based on the robotics literature on the kinematics of the equivalentrotator linkage in the most general case of oblique rotators. We provide a simple intuitive view and derivation of thepolynomial for the case in which each of the three pair of torsional axes has a common point. Our method generalizesprevious work on analytical loop closure in that the torsion angles need not be consecutive, and any rigid interveningsegments are allowed between the free torsions. Our approach also allows for a small degree of flexibility in the bondangles and the peptide torsion angles; this substantially enlarges the space of solvable configurations as is demonstratedby an application of the method to the modeling of cyclic pentapeptides. We give further applications to two importantproblems. First, we show that this analytical loop closure algorithm can be efficiently combined with an existingloop-construction algorithm to sample loops longer than three residues. Second, we show that Monte Carlo minimizationis made severalfold more efficient by employing the local moves generated by the loop closure algorithm, when appliedto the global minimization of an eight-residue loop. Our loop closure algorithm is freely available at http://dillgroup.ucsf.edu/loop_closure/.

© 2004 Wiley Periodicals, Inc. J Comput Chem 25: 510–528, 2004

Key words: loop closure; kinematic; loop modeling; Monte Carlo minimization

Introduction

We consider the problem of loop closure, i.e., finding structures ofa segment in a chain molecule that are geometrically consistentwith the rest of the chain structure. This problem has an importantapplication in homology modeling,1 when segments of insertionsor deletions are to be modeled while the rest of the proteinstructure is relatively well known from structures of homologousproteins. Another useful application is in the area of Monte Carlosimulations, where alternative segment structures can be intro-duced as elementary localized moves.2–11 These moves can lead toimproved efficiency in conformational sampling. Unlike Cartesianmoves, they avoid geometric distortions and the high energypenalty these entail. On the other hand, the deformation producedby these moves is limited to a segment, while uncoordinatedtorsion angle moves result in movement proportional to the dis-tance of each atom from the perturbation axes, resulting in largeuncontrolled moves. Other possible applications of the loop clo-sure problem are discussed in ref. 12.

It is well known that the number of constraints is identical tothe number of degrees of freedom (DOFs) in the case of loops withsix free torsion angles, or three residue loops for proteins.13 Thismeans that, in general, such loops may be found as discretesolutions of the loop closure problem. This fact has been knownfor some time in the Kinematic theory of Mechanisms.14 Kine-matics is the branch of mechanics whose concern is the geometricanalysis of motion, especially constrained displacements withoutregard to forces. The kinematic analysis of systems of rigid objectsconnected by flexible joints, such as multijointed robotic manipu-lators, exhibits many similarities with the geometric analysis ofmacromolecules, when the forces responsible for the motions areignored and the main question of interest is the analysis of possibleconformations consistent with the constraints associated with bondlengths and bond angles. In robotics, joints that allow one arm to

*These 2 authors contributed equally to this study.

Correspondence to: C. Seok; e-mail: [email protected]

© 2004 Wiley Periodicals, Inc.

Page 2: A Kinematic View of Loop Closure - University of New Mexicovageli/papers/loop.pdf · A Kinematic View of Loop Closure EVANGELOS A. COUTSIAS,1,* CHAOK SEOK,2,* MATTHEW P. JACOBSON,2

rotate about another at a fixed angle are called Rotator pairs or“R-pairs.” The arm system analogous to a macromolecule with sixrotatable bonds is a “6R” linkage. The kinematic analysis of theseand other similar linkages leads to Fourier polynomials in the sixrotation angles, �i, i.e., polynomials in the variables cos �i, sin �i.By introducing the half-angle transformation

ui � tan��i/2� 3 sin �i �2ui

1 � ui2 , cos �i �

1 � ui2

1 � ui2 ,

i � 1, . . . , 6,

a system results of polynomial equations in the ui. A polynomialformulation offers several advantages, such as relative ease ofsolution, available theorems for the accurate enumeration of thenumber of solutions within a given region when there is only adiscrete number, and in general, better understood numerical prop-erties. For instance, the number of real roots of a univariatepolynomial equation contained in an interval can be readily deter-mined by Sturm’s method.15 No such method is available for moregeneral, transcendental equations. Therefore, an advantage of ourpolynomial equation compared to the transcendental equation ofGo� and Scheraga13 is that the exact number of solutions can befound, which is important for satisfying microscopic reversibilityin Monte Carlo simulations.3 Methods from algebraic geome-try16,17 and homotopy theory18 have been applied to such systems,and robust algorithms exist for the determination of their solutions,real or complex. A thorough discussion of robotic linkage systemscan be found in the text by Duffy,19 while informative expositionsand reviews of the relevant literature can be found in the classictext by Hartenberg and Denavit,14 and more recently, in the text byHunt.20 A relatively current survey is given in Manocha.21

The problem of closing 6R loops is central for the control ofrobotic manipulators, where in many common applications oneend is fixed and the other (the “end effector”) must be positionedat a specific location and with a given orientation. Adding a 7throtator gives a system with one additional DOF, offering thepossibility of continuous motion with two fixed ends. This prob-lem, characterized as “The Mount Everest of robotic manipulators”by Freudenstein22 was reduced to a single variable, 16th-degreepolynomial equation by Lee and Liang.23 In their solution, the 7throtational DOF is used as a control parameter, and the real solu-tions obtained for the other angles once the 7th angle is fixedprovide alternative closure configurations for the system. Themethod applies to systems with arbitrary axes of rotation, but thederivation is quite involved, and it is difficult to arrive at anintuitive understanding of its solutions and the implied chaindisplacements.

In this article we consider an important special case in whichthe 6R problem has an intuitively simple description: consider allthe motions of a chain molecule that involve changes in only sixbackbone torsions. If these are arranged so that they form threecoterminal pairs, then the segments between successive pairs willform effectively a coarser chain of three (closed case) or four(open case) rigid bodies, joined at the locations of the pairedtorsion axes. An illustration is given in Figure 1 for a tripeptideexample, where the four rigid bodies are (N1 C�1),(C�1 C1 N2 C�2), (C�2 C2 N3 C�3), and (C�3 C3). If

we now require the two end segments of the chain (N1 C�1) and(C�3 C3) to remain at a fixed position relative to each other,(C�3 C3 N1 C�1) forms a third segment. Now each of thethree rigid units (C�1 C1 N2 C�2), (C�2 C2 N3 C�3), and(C�3 C3 N1 C�1) has two junctions on it, attaching to theother two units. Define the line connecting the two junctions on aunit as the virtual axis of the unit (C�1–C�2, C�2–C�3, and C�3–C�1). The motions of the middle two segments relative to the restof the chain can only be composed of individual rotations of eachabout their respective virtual axes (C�1–C�2 and C�2–C�3) or jointrotations of the two as a unit about the third (fixed) axis (C�3–C�1).The three virtual axes form a triangle, with vertices at the threejunctions (C�1, C�2, and C�3). If we rotate each of the units aboutits axis by some angle �i, i � 1, 2, 3, the rotatable bonds at eitherend of the unit maintain a fixed dihedral with the axis and eachother (a dihedral formed by C�1–C1, C1–N2, and N2–C�2, forexample). Any possible motion that a concerted change in theoriginal six torsions is capable of can thus be described in terms ofthese three angles. If we now require that bond angles (�i) betweenthe actual bonds at the junction of two segments remain at a givenvalue, these motions become coupled. The feasible configurationswhere all constraints are satisfied form a discrete set, found as thesolutions of a polynomial equation in the corresponding threevariables ui, i � 1, 2, 3. Having sets of rotation axes arranged incoterminal pairs is a natural property of polypeptide chain back-bones where one encounters pairs of rotatable bonds at each C�

atom (with the exception of proline), and similar pairings arecommon in other molecules of interest, such as RNA wheregroupings of five pairwise coterminal rotatable bonds in the phos-phate backbone are separated by relatively rigid sugar rings.

In its simplest form our algorithm may utilize the torsion anglesat three C� atoms located consecutively along a peptide backbone.This is the “tripeptide loop closure” problem. The tripeptide loopclosure problem was first considered by Go� and Scheraga,13 whoreduced the problem to solving a transcendental equation in asingle variable in the case of planar peptide torsion angles. Themethod has found numerous applications and extensions. Bruc-coleri and Karplus24 allowed small variation in bond angles as ameans of extending the method to cover normal variability of theseparameters in proteins of known structures, and applied the methodto loop modeling.25 Dinner7 produced a generalization to the

Figure 1. Definition of three variables �1, �2, and �3 and three con-straints on �1, �2, and �3 in the canonical tripeptide loop closureproblem.

Kinematic View of Loop Closure 511

Page 3: A Kinematic View of Loop Closure - University of New Mexicovageli/papers/loop.pdf · A Kinematic View of Loop Closure EVANGELOS A. COUTSIAS,1,* CHAOK SEOK,2,* MATTHEW P. JACOBSON,2

nonplanar peptide case, still in terms of transcendental equations.More recently, Wedemeyer and Scheraga12 derived a single-vari-able 16th-degree polynomial equation for the particular case ofloop closure involving three consecutive residues with planarpeptide torsions at canonical bond lengths and angles, i.e., whenonly three consecutive pairs of � and � torsion angles are allowedto vary.

One of the generalizations possible with our algorithm is for thethree pair of torsion angles with coterminal axes to be chosenalong a molecular chain with arbitrary, fixed structure betweensuccessive pairs, including nonplanar peptide torsion angles. Thisgeneralization is useful for several reasons. Sampling with fixedbond angles and peptide torsion angles can significantly limit thecoverage of conformational space,24 and moreover, fluctuations ofthe order �10° for the bond angles and peptide torsion angles arenot uncommon among proteins of known structure. Further, themethod presented here allows for the torsion angles participatingin the move to be chosen at arbitrary locations along the chain.This allows its application to diverse situations, such as to themodeling of longer loops and loops in polymers and nucleic acids.Although it is possible to derive a description in terms of a16th-degree polynomial even if all the angles are chosen com-pletely independently,26 the choice of paired �–� angles leads toa simple formulation in terms of three natural angle variables:

1. Choose three C� carbons located successively (but not neces-sarily consecutively) along the chain, say C�i, i � 1, 2, 3.

2. Rotate the segment C�1, . . . , C�2 by angle �1 about the axisC�1–C�2.

3. Rotate the segment C�2, . . . , C�3 by angle �2 about the axisC�2–C�3.

4. Rotate the segment C�1, . . . , C�2, . . . , C�3 by angle �3 aboutthe axis C�1–C�3.

5. Choose the angles �i, i � 1, 2, 3 so that the bond anglesNi–C�i–C�i assume (near) canonical values at each of the atomsC�i.

Satisfaction of the compatibility conditions in the last step isassured by the solution of the polynomial system mentioned above,and every real solution results in a distinct configuration. Theanalysis can easily be applied to chains of arbitrary structure (i.e.,it is not limited to polypeptides), provided there exist pairs ofcoterminal rotatable bonds. In the robotics literature, R-joints withaxes that have a common point are referred as “spherical pairs.”We are thus studying the 6R system with three interconnectedspherical pairs.19 Problems of structure similar to the tripeptideloop closure problem are also common in another area of compu-tational geometry: the motion planning for the assembly of foursolid objects can be cast in identical mathematical form.27

Given that the general 6R problem can be described by a16th-degree polynomial, it follows that there will always be aneven number of real solutions, counting multiplicities, and, atmost, 16 distinct real solutions are possible, leading in turn to atmost 16 distinct loop configurations. Such an example has beenfound by Manseur and Doty28 for a 6R robotic manipulator. Doddet al.,3 in their study of concerted rotations in polymer systems,report as many as 12 solutions in certain cases, but because theystudy the problem in its transcendental form they need to rely on

expensive, exhaustive searches to arrive at a complete enumerationwith confidence. For the canonical tripeptide loop closure, Wede-meyer and Scheraga12 have found at most eight real solutions ofthe closure polynomial and, hence, at most eight distinct confor-mations. Our own studies with the more general peptide geometryhave so far only discovered cases with at most 10 real solutions,and we believe that this might be a limitation due to the fact thatthe building blocks of the problem are of special form, perhaps notcapable of covering the entire set of possible behaviors of thepolynomial system unless a certain variability in the parameters isintroduced. For example, the obtuseness of the bond angles at theC� carbons should be contrasted with the fact that the anglesbetween successive arms of the manipulator in ref. 28 are all /2except for one pair of parallel axes.

Even though it is clear that the analytical loop closure method,being exact, is much more efficient compared to numerical loopclosure methods10,29,30 for three residue loops, application of theanalytical method to modeling longer loops has not yet beenexplored extensively. For loops of n torsion angles, (n � 6) DOFsneed to be sampled with some additional search method. Here weemploy an existing loop construction method31 to sample (n � 6)torsion angles, and solve for the remaining six torsion angles usinganalytical loop closure. Other approaches such as a hierarchicalmethod and a decimation method have been suggested by Wede-meyer and Scheraga12 for sampling longer loops using an analyt-ical loop closure method. We sample (n � 6) torsion anglesdirectly because it is possible to incorporate screenings for Ram-achandran allowed regions and steric clashes. These screens en-hance the efficiency of sampling because they can be applied atearly stages, and nonpromising structures can be pruned out beforethe whole model loop is constructed.

We believe that an advantage of our work is the simplified,intuitive view of the tripeptide loop structure (or six free torsionangles in general), which enables us to develop insights for usefulapplications. General theory and methods are presented in the nextsection, and results of applications following that. Then we de-scribe the simple view of the tripeptide loop closure, derive theloop closure equation, and present an efficient algorithm for solv-ing the polynomial equation. Further generalizations to the casewhere the torsion angle pairs are chosen at arbitrary (noncontigu-ous) C� atoms and to the case of an additional 7th dihedral are thendiscussed. A perturbation method for increasing the coverage ofconformational space is also discussed. Applications to bond angleperturbations, longer loop modeling and Monte Carlo Minimiza-tion are presented, and then conclusions are given.

Theory and Methods

Loop Closure Formulation

We pose here the loop-closure problem in its simplest form asfollows: given a molecular chain with inflexible bond lengths andbond angles, find all possible arrangements with the property thatall bond vectors are fixed in space except for a contiguous set andsuch that the changes are made in at most six intervening dihedralangles. For convenience of presentation, we illustrate our deriva-

512 Coutsias et al. • Vol. 25, No. 4 • Journal of Computational Chemistry

Page 4: A Kinematic View of Loop Closure - University of New Mexicovageli/papers/loop.pdf · A Kinematic View of Loop Closure EVANGELOS A. COUTSIAS,1,* CHAOK SEOK,2,* MATTHEW P. JACOBSON,2

tion for the case of a tripeptide loop with occasional reference tomore general cases.

Tripeptide Loop-Closure Equation

We view the six-torsion loop closure problem in a simplifiedrepresentation as shown in Figure 1. A tripeptide loop example isshown in the figure, where four atoms N1, C�1, C�3, and C3 arefixed in space, and all other atom positions are to be determined.Atom types N, C�, and C refer to nitrogen, alpha carbon, andcarbonyl carbon, and the subscripts to the residue number (1, 2, or3).

There are three variables and three constraints in this picture,which is equivalent to, but simpler than, the six-variable/six-constraint picture of Go� and Scheraga.13 The three variables in thepicture are the three rotation angles �i (i � 1, 2, 3) of the Ci andNi�1 atoms about the C�i–C�i�1 virtual bonds, where i � 4 isequivalent to i � 1. Ni�1 is rotated with Ci because there is nofree rotation involved between them. The �i rotations preserve allthe bond lengths and angles except for the three bond angles �i

(:� �NiC�iCi) shown in Figure 1. The condition that �i anglesare equal to fixed values forms the three constraints in our prob-lem. The �i angles are defined in the reference frame where all C�i

are fixed. C�1 and C�3 are fixed by definition, and so are the sidelengths of the triangle formed by C�1, C�2, and C�3. C�2

therefore traces a circle about the C�3–C�1 axis. In the referenceframe of Figures 1 and 2a, C�2 is fixed and the rotation of C�2 isreplaced by an equivalent rotation of N1 and C3 about the sameaxis. Once the problem in this reference frame is solved, the N1

and C3 atoms (together with all other atoms in between) can be

rotated back to the original frame by a reverse rotation about thesame C�3–C�1 axis. This concept is illustrated with eight alter-native loop conformations in the reference frame of the three C�

atoms in Figure 2a, and the corresponding picture in the originalframe of fixed atoms N1, C�1, C�3, and C3 is shown in Figure 2b.Our formulation does not require planarity of the peptide torsion,and covers a more general case where arbitrary rigid structuresintervene between the C�i–Ci and Ni�1–C�i�1 bonds.

In the derivation below, the bond vectors C�iCi andC�i�1Ni�1 (boldface symbol of a pair of atoms represents thebond vector of the pair) are first expressed in terms of �i angles andother fixed quantities, and then the �i angle constraints are writtenin terms of dot products of these vectors.

First consider the following unit vectors:

zi � C�iC�i�1/�C�iC�i�1�, ri� � C�iCi/�C�iCi�,

ri � C�i�1Ni�1/�C�i�1Ni�1�, (1)

and define the following fixed angles in terms of these vectors:

�i � cos�1�zi � zi�1�, (2)

�i � cos�1�zi � ri��, (3)

�i � cos�1��zi � ri�, (4)

where �i, �i, and �i are all taken to be in the range [0, ]. Theseangles are shown in Figure 3 in the context of the C� triangle.

We now define a right-handed local coordinate system by threeunit vectors (xi, y, zi) for each �i rotation, where the reference axisy is conveniently set to y � (z3 � z1)/�z3 � z1� so that it isperpendicular to all zi defined in eq. (1), and to xi � y � zi. Aspictured in Figure 4a, the �i angle is now precisely defined to bethe rotation angle of ri

� (or C�iCi) about zi in this local coordinatesystem, and i is defined similarly as the rotation angle of ri

(orC�i�1Ni�1) about zi.

The angles �i and i are related to each other because ri� and ri

are rotated together as a rigid body. Figure 4a shows that �i and i

are related by the simple relation

Figure 2. (a) Alternative configurations shown in the reference frameof the three fixed C� atoms. (b) The same alternative loop closureconfigurations as in (a), but now in the original frame of the fixedatoms N1, C�1, C�3, and C3.

Figure 3. Definition of angle parameters �i, �i, and �i.

Kinematic View of Loop Closure 513

Page 5: A Kinematic View of Loop Closure - University of New Mexicovageli/papers/loop.pdf · A Kinematic View of Loop Closure EVANGELOS A. COUTSIAS,1,* CHAOK SEOK,2,* MATTHEW P. JACOBSON,2

i � �i � i, (5)

where i is the dihedral angle defined by the three vectors (CiC�i,C�iC�i�1, C�i�1Ni�1), as illustrated in Figure 4a.

As can be seen in Figure 4b, the unit vectors ri� and ri

areexpressed in terms of the above defined unit vectors and angles as

ri� � cos �izi � sin �i�cos �ixi � sin �iy�,

ri�1 � �cos �i�1zi�1 � sin �i�1�cos i�1xi�1 � sin i�1y�. (6)

The �i angle constraints can then be expressed in terms of ri�

and ri�1

ri� � ri�1

� cos �i. (7)

Substitution of eq. (6) into eq. (7) gives the equations

cos �i � cos �icos �i�1cos �i � sin �i�sin �i�1cos �icos i�1

� cos �i�1sin �icos �i� � sin �i�1sin �i�sin �isin i�1

� cos �icos �icos i�1�, (8)

with i � 1, 2, 3, where the dot products (zi � zi�1) � cos �i, (zi �y) � 0, (zi � xi�1) � sin �i, (xi � xi�1) � cos �i, (xi � y) � 0,and (zi�1 � xi) � �sin �i have been used.

In a later section, i�1 is first eliminated from eq. (8) using eq.(5), and the three coupled equations for �i (i � 1, 2, 3) arereduced to a 16-degree polynomial for the single variable u3 �tan(�3/2). From the theory of polynomial systems32 it follows thatfor every (real) solution there corresponds a unique (real) triplet(�1, �2, �3), so that, in general, there are at most 16 real solutions.

Equation (8), describing the rotation of the C�i–Ni and C�i–Ci

bonds about the virtual bonds C�i�1–C�i and C�i–C�i�1, respec-tively, is known in the theory of mechanisms as the equation for aRR joint with coterminal axes and with the two arms constrainedto be at a fixed distance. It was derived in 1897 by Bricard in hisstudy of flexible octahedra,33 and considerable literature about itexists.20,34 A geometrical analysis of the individual (uncoupled) �i

constraint eqs. (7) and (8) is provided in Appendix A.

The Algorithm

Once the polynomial equation is obtained, all atomic coordinatesin the loop can be determined. Before presenting a detailed deri-vation and solution of the polynomial equation, we give here asimple outline of the loop closure algorithm that finds the atompositions in the loop.

1. The polynomial coefficients are determined from the angles �i,�i, �i, and i: first, the angles �3, �1, 3 are determined from thecoordinates of N1, C�1, C�3, and C3, and all other angles arecomputed from the given bond lengths and bond angles (ca-nonical, or, more generally, for arbitrary, specified values ofthese). The coefficients of the 16th degree polynomial are thendetermined algorithmically following the steps described in thenext section and Appendix B.

2. u3 � tan(�3/2) is obtained by solving the 16th-degree polyno-mial, as described later. u2 � tan(�2/2) and u1 � tan(�1/2) aredetermined from u3 as described in Appendix C, and �i � 2tan�1ui and i � �i � i follow.

3. Positions of all the atoms are obtained from �i and i: first, thereference frame is defined. The unit vector z3 is determinedfrom the coordinates of C�1 and C�3. z1 is set arbitrarily exceptthat the angle between z1 and z3 is �1. z2 � �z1 � z3 follows.y and xi are computed from zi. Next, ri

� and ri (i � 1, 2) in

the reference frame are obtained from �i and i using eq. (6).All atom positions are then computed from these vectors in thereference frame. The unit vectors define �3

(0) that are determinedfrom the fixed coordinates of N1, C�1, C�3, and C3. All atomsare then rotated about z3 by (�3

(0) � �3) to bring them to theoriginal frame.

Derivation of the Polynomial Equation

Equation (8) is converted to polynomial form in the variables wi,ui where

wi :� tan�i/2�, ui :� tan��i/2�. (9)

Using the half-angle formulas

cos � �1 � u2

1 � u2 , sin � �2u

1 � u2 , u � tan�

2, (10)

Figure 4. (a) A peptide unit along the C�i–C�i�1 virtual bond. In thelocal coordinate system, �i and i are related by i � �i � i. (b)Geometric definitions at the C�i junction. The black circle at the origindenotes the C�i atom, while the vectors ri�1

and ri� point to the Ni and

Ci atoms, respectively.

514 Coutsias et al. • Vol. 25, No. 4 • Journal of Computational Chemistry

Page 6: A Kinematic View of Loop Closure - University of New Mexicovageli/papers/loop.pdf · A Kinematic View of Loop Closure EVANGELOS A. COUTSIAS,1,* CHAOK SEOK,2,* MATTHEW P. JACOBSON,2

Equation (8) becomes

Aiwi�12 ui

2 � Biwi�12 � Ciwi�1ui � Diui

2 � Ei � 0 (11)

where

Ai � �cos �i � cos��i � �i�1 � �i�

Bi � �cos �i � cos��i � �i�1 � �i�

Ci � 4 sin �i�1sin �i

Di � �cos �i � cos��i � �i�1 � �i�

Ei � �cos �i � cos��i � �i�1 � �i�.

Equation (11) is called the tetrahedral equation,33 because it de-scribes the alternative shapes of the tetrahedral formed by the fourfixed angles �, �, �, and �. This equation is quadratic both in w andu, denoting that, in general, to each value of one of the dihedrals� and there correspond two values of the other. After eliminatingwi�1 from eq. (11) using eq. (5), because

wi � tan��i/2 � i/2� �tan��i/2� � tan� i/2�

1 � tan� i/2�tan��i/2��

ui � i

1 � iui(12)

where we introduced � tan( /2), we arrive at a system of threebiquadratic (quadratic in two variables) equations in ui � tan(�i/2):

P1�u3, u1� :� �j,k�0

2

pjk�1�u3

j u1k � 0, (13)

P2�u1, u2� :� �j,k�0

2

pjk�2�u1

j u2k � 0, (14)

P3�u2, u3� :� �j,k�0

2

pjk�3�u2

j u3k � 0, (15)

where the coefficients pjk(1), pjk

(2), and pjk(3) are defined in terms of

the fixed angles �i, �i, �i, �i�1, and i�1. These coefficients andall other coefficients that follow below are derived in Appendix B.

Before proceeding with solving this system, we address theexpected number of solutions. Although the classical Bezout the-orem bounds the number of zeros of a system of polynomialequations by the product of their degrees (here 43 � 64), a sharperresult, referred as the “Bernshtein–Kusnirenko–Khovanskii(BKK) Theorem”35 is known, which takes advantage of the factthat the above polynomials are not the most general 4th-degreepolynomials in variables ui, i � 1, 2, 3 (e.g., terms like ui

4, ui3uj

or ui2ujuk are not present) and gives for the above system the upper

bound as 16. Although we will not present the easy proof here, wemust mention that this theorem is sharp, meaning that the number16 is realizable for some sets of values of the coefficients. In thefollowing discussion we carry out the elimination of variables in

two steps in a similar manner as in ref. 12, taking advantage of thefact that each polynomial is bivariate so that variables can beeliminated one at a time. The final univariate polynomial is oforder 16. Given the previous discussion, no redundancy can bepresent in this polynomial in general and all 16 solutions havepotential physical significance. We have found at most 10 realsolutions when we introduce variances in the peptide torsion andbond angles, in contrast to previous works12,13 in which at mosteight solutions were found in the rigid planar tripeptide case.However, given the rarity of such cases (3 in 1 million for thedatabase we explored36) robustness issues may play a role. Thisdoes not mean that exceptional cases may still not be found wherethe number of distinct real solutions is 16. We are currentlyinvestigating this question.

The method of resultants (see Appendix C) is used to reducethe above equations to an equation for a single variable. In short,the variable u1 is first eliminated from eqs. (13) and (15) to give

R8�u2, u3� � �j,k�0

4

qjku2j u3

k � 0, (16)

and u2 is eliminated from eqs. (14) and (16) to give

R16�u3� � �j�0

16

rjku3j � 0. (17)

More specifically, eq. (16) is obtained by rewriting eqs. (13)and (15) as

P1�u3, u1� � �k�0

2

Lk�u3�u1k

and

P2�u1, u2� � �j�0

2

Mj�u2�u1j ,

where Lk :� Lk(u3) and Mj :� Mj(u2) are themselves quadraticsin u3 and u2, respectively (see Appendix B).

The resultant of the two biquadratics P1 and P2, which elim-inates u1, is given by the determinant

R8�u2, u3� � �L2 L1 L0 00 L2 L1 L0

M2 M1 M0 00 M2 M1 M0

� � �j,k�0

4

qjku2j u3

k � 0. (18)

We now write R8(u3, u3) as a quartic in u3 introducing thefunctions Qj(u3), quartics in u3:

R8�u2, u3� � �j�0

4

Qj�u3�u2j ,

Kinematic View of Loop Closure 515

Page 7: A Kinematic View of Loop Closure - University of New Mexicovageli/papers/loop.pdf · A Kinematic View of Loop Closure EVANGELOS A. COUTSIAS,1,* CHAOK SEOK,2,* MATTHEW P. JACOBSON,2

and eq. (15) as

P3�u2, u3� � �j�0

2

Nj�u3�u2j ,

where the Nj are quadratics in u3. The final resultant, whicheliminates u3 to give a degree 16 polynomial in u1 is given by

R16�u3� � �N2 N1 N0 0 0 00 N2 N1 N0 0 00 0 N2 N1 N0 00 0 0 N2 N1 N0

Q4 Q3 Q2 Q1 Q0 00 Q4 Q3 Q2 Q1 Q0

� � �j�0

16

rjku3j � 0. (19)

One key advantage of the reduction to polynomial form carriedout in the previous subsections is the availability of reliable soft-ware for the determination of polynomial zeros. The solution canbe carried out by either directly solving the polynomial equation,or by reduction to the solution of a generalized eigenproblem.21

For completeness we give a brief description of both schemesbelow. In our studies, the direct solution has proved to be moreefficient.

Direct Solution and Sturm Chains

We use the polynomial solution package available from ACM.37

This package uses Sturm’s method15 to determine the number ofreal zeros within an arbitrary interval. The intervals are bisectedand refined until all the solutions are put in separate, tight intervals.The solutions are then refined using a secant method.

Generalized Eigenproblem Formulation

The above polynomial equation can be formulated as a generalizedeigenproblem. Following Manocha,21 we write R16(u1) as a de-terminant of a matrix polynomial with matrix coefficients Sk:

det� �k�0

4

Sku1k� � 0, (20)

which is equivalent to

det��u1 � �� � 0 (21)

with

� :� �I 0 0 00 I 0 00 0 I 00 0 0 S4

�, � :� �0 I 0 00 0 I 00 0 0 I

�S0 �S1 �S2 �S3

�, (22)

where all blocks are of size 6 � 6. The resulting generalizedeigenproblem, u1�� � ��, can be solved with the lapack

routine dggev.f, for example. It is also possible to take advantageof the sparsity of the matrices �, �, if desired.

Generalizations of the Method

Noncontiguous C� Atoms

As is clear in Figure 1, the loop closure process involves threerotations about the axes C�i–C�i�1 (i � 1, 2, 3) and threeconstraints relating these rotations that ensure that the bond anglesbetween the two rotatable bonds Ni–C�i and C�i–Ci at the C�i areset. The chain of atoms intervening between the C�i is rotatedrigidly. The problem is completely characterized by giving theangles �i between the virtual bonds (which, together with one ofthe edges, say C�i–C�i�2, completely characterize the triangleC�i, C�i�1, C�i�2), the angles �i, �i formed by the rotatablebonds at each C�i and the edges of that triangle, as well as thedihedrals i. Nowhere in this construction is any assumption madeabout the intervening structure, nor are any such assumptionsimplicit in the derivation of the loop closure equations. Therefore,the algorithm can be applied without modification to moves in-volving arbitrary triads of C� atoms (Fig. 5), i.e., the angle param-eters �i, �i, �i, and i that completely determine the problem aredefined in the same way as in Figures 3 and 4a from the threeatoms at each vertex of the C� triangle. This is a new featurerelative to other algorithms. Of course, more general moves be-come possible now, where some of the intervening dihedrals arealso changed, modifying the parameters of the basic triangle. Toillustrate this additional flexibility we consider in the next subsec-tion the simplest such move, namely the change of one additionaldihedral. This introduces a continuous DOF to the problem, and itforms the basis of a Monte Carlo move.

Additional Dihedral Angle

We now consider a method of finding alternative local structureswhen an arbitrary dihedral angle is changed. Six angles need to beadjusted to compensate the change such that the chain structure is

Figure 5. General chain loop closure.

516 Coutsias et al. • Vol. 25, No. 4 • Journal of Computational Chemistry

Page 8: A Kinematic View of Loop Closure - University of New Mexicovageli/papers/loop.pdf · A Kinematic View of Loop Closure EVANGELOS A. COUTSIAS,1,* CHAOK SEOK,2,* MATTHEW P. JACOBSON,2

unchanged beyond the local region. Concerted angle perturbationsof this kind can be used as elementary moves in Monte Carlosimulations to increase the sampling efficiency. A simple way is toadjust six consecutive angles adjacent to the driver angle.7 Theterminal atom position is changed (either C�1 or C�3 in Fig. 1) inthis case, and the six angle loop closure problem can then besolved with the changed C� triangle geometry. Here, we describea more general and flexible method of compensating the anglechange, in which six dihedrals to be adjusted are allowed to beseparated in pairs arbitrarily in sequence, and the driver angle canbe placed anywhere in between the adjusting dihedral pairs.

Figure 6 shows a case in which the driver angle � is placed onthe left-hand side of the C� triangle, as an example. As before, weconsider three �i rotations separately, and then apply the �i con-straints. This is possible because the net effect of the driver anglerotation in our simplified picture is to change some of the param-eters for the C� triangle geometry that are independent of �i

rotations. The geometric parameters for the base of the triangle,C�3C�1, �3, �1, are invariant because they are fixed by con-straints, and those for the right side of the triangle, C�2C�3, �2,�3, are also invariant because rotation due to the angles on the leftside does not change the relative orientation of the atoms on theright. Those for the left side, C�1C�2, �1, �2, change because thedriver angle rotates N2 and C�2, but not C�1 and C1. Due to thechange in C�1C�2, zi (i � 1, 2) and �i (i � 1, 2, 3) change.Equation (8) can be then derived with the changed parameters.

These flexible concerted local moves are expected to improveefficiency of conformational search. A Monte Carlo with minimi-zation method has been employed together with the concertedmoves described above, and severalfold of improvement in effi-ciency has been observed compared to existing search methods(see later).

Bond Angle Perturbations

So far we have fixed bond lengths, bond angles, and peptidetorsion angles at their canonical values in the loop closure algo-

rithm, although there is no limitation on what specific values haveto be used. However, real proteins exhibit a range of valuesdepending on their chemical environment. When the rigid loopclosure method is used to sample structures for real proteins, somestructures cannot be sampled if the flexibility in bonds and anglesis ignored. This fact was first noticed by Bruccoleri and Karplus(BK).24 To test how much the rigid sampling can cover the realprotein structure space, three-residue structures were deleted arti-ficially from the Top500 database of high resolution, nonredundantprotein structures,36 and our exact loop closure algorithm was usedto fill the gaps. About 27.5% of the gaps could not be filled withthe rigid sampling. (The bond lengths and angles used are NC� �1.45 Å, C�C � 1.52 Å, CN � 1.33 Å, �NC�C � 111.6,�C�CN � 117.5, and �CNC� � 120.0.) BK developed asearch method to find minimal bond angle variations to close agiven loop. Our method is used to vary peptide torsion angles aswell as bond angles, because we now have a more general formula.We also present a much simpler, efficient method of perturbingbond angles, where no extensive search is involved.

We first present a method in which the minimum of the poly-nomial is moved by angle changes so that the minimum at leasttouches the axis to give roots, in a similar spirit to BK. The anglesare perturbed by the minimum amount (so as to minimize theenergy penalty). We then show a faster method that makes use ofthe knowledge of the direction of angle change that maximizes theprobability of having loop-closure solutions. This method onlydetermines the sign of angle change, but not the minimum mag-nitude.

Perturbation by Angle Search

The minimum of the polynomial and the derivatives of the mini-mum with respect to perturbed angles are computed, and a steepestdescent search is performed to bring down the polynomial mini-mum to equal to or less than zero. A more sophisticated LBFGSminimization algorithm was tried, but the efficiency was similar.The steepest descent iteration is terminated when loop closuresolutions are found, preset maximum angle perturbations arereached, or maximum number of iterations (set to 200) is reached.

The polynomial minimum is obtained by finding roots ofR�16(u) � 0 and comparing the polynomial values at the roots. Thederivatives of the minimum with respect to perturbed angles arecomputed by a finite difference method. The minimum at theperturbed angles are computed by a Newton–Raphson minimiza-tion starting from the current minimum. Several parameters areintroduced to accelerate the steepest descent iteration. The stepsize of the steepest descent minimization is increased or decreased(by a factor of fi or fd) depending on whether the previous iterationdecreased or increased the minimum. The initial step size is chosenso that the largest component of angle change is equal to di. Ateach iteration, the largest angle component change is restricted tobe at most dm. fi � 9, fd � 0.1, di � 0.1, and dm � 0.0001 werechosen by trial and error to maximize the number of loop closuresolutions for the cyclopentapeptide example below. It is also foundthat angle perturbations prior to steepest descent help in findingmore loop-closure solutions. For example, when the C� trianglecannot be formed or is formed marginally because the base lengthC�1–C�3 is too short or too long to be reached with the canonical

Figure 6. Deformation of the C� triangle due to a dihedral (�)perturbation. Changes in �i angles due to the triangle deformation areshown.

Kinematic View of Loop Closure 517

Page 9: A Kinematic View of Loop Closure - University of New Mexicovageli/papers/loop.pdf · A Kinematic View of Loop Closure EVANGELOS A. COUTSIAS,1,* CHAOK SEOK,2,* MATTHEW P. JACOBSON,2

bond angles, the angles are perturbed by the maximum amount toallow for longer or shorter base lengths. In addition, when it isfound that there exists no solution for the two-cone system for anyvertex (see next section and Appendix A), angles are adjusted tomaximize the overlap of the two cones as in the next subsection.

Simple Perturbation Method

We now present a simple bond angle perturbation method thatdoes not require searching the bond angle space, thus solving theloop-closure problem only once. This can be accomplished byexamining the components of the simple picture in Figure 1.Figure 9a in Appendix A shows i�1 rotation of ri�1

and �i

rotation of ri�. Each vector traces a cone, so we call it a two-cone

system. The two vectors have to satisfy the bond angle constrainteq. (7), and this limits the accessible ranges of i�1 and �i values.These ranges depend on the local geometry determined by �i�1,�i, �i, and �i. Each vertex of the triangle in Figure 1 has atwo-cone system, so there are three two cones in all. The loopclosure solutions are determined by the intersection of the allowedi�1/�i regions in the three two-cone systems. By construction, �i

does not change the triangle geometry or any other parameters, butvaries the accessible ranges of i�1/�i. It is possible to determinewhether to increase or decrease �i to maximize the accessibleranges at each two-cone, which in turn maximizes the overlaps oftwo-cone systems, and the possibility of closing the loop. This isdone by classifying the two-cone types depending on how theextrema of i�1/�i are arranged, and determining the effect of �i

change on the extrema. The details are provided in Appendix A.

Results and Discussion

In this section we present an application of the angle perturbationmethod, then give two applications of the analytical loop closure,to longer loop modeling and Monte Carlo Minimization.

Test of the Angle Perturbation Methods

We apply the perturbation methods presented above to the three-residue gaps artificially deleted from the Top500 structures.36

When fixed canonical angle parameters are assumed and the loop-closure algorithm is applied to fill the three-residue gaps, 22,981(27.5%) out of total 83,327 gaps do not have loop-closure solu-tions. (Those loops including proline have been excluded in thistest.) The number of missed gaps decreases dramatically with thesimple perturbation method from earlier: 1249 (1.5%) and 469(0.56%) with the maximum angle variation of 5 and 10 degrees,respectively. Note that only 3 NC�C angles have been varied here.The full angle perturbation from above misses 209 (0.25%) and 23(0.028%) for the maximum allowed perturbation 5 and 10 degrees,respectively, when nine angles are varied (three NC�C, twoC�CN, two CNC�, and two peptide torsion angles). In summary,the simple perturbation method is successful in covering most ofthe conformational space realized in the database, and the fullangle perturbation can push the limit to almost perfection. Com-putation time increases only a few percent even with the full searchmethod for this test because most of the loops have solutions, and

only a few iterations are needed for angle search. Computationtime increases more with perturbations when there are more casesin which loop-closure solutions are not found such as when appliedto loop modeling or for exhaustive sampling as in the cyclopen-tapeptide example below.

Next, we consider the cyclopentapeptide Gly-Gly-Gly-Pro-Proexample for which Go� and Scheraga38 and Bruccoleri and Kar-plus24 sampled the conformational space. The two Pro � angles arefrozen, the two Pro � angles are varied with a grid of 5 deg, andthe remaining six Gly �/� torsions are solved for. We use the samebond lengths and angle parameters as BK, and 346 loops are closedwhen no perturbation is used. BK closed 1507 and 1565 loops withtheir fast and slow bond angle search method, respectively, vary-ing nine bond angles with maximum variation of 5 degrees. Ourfull perturbation method closes 1517 loops when the same 9 angles(3 NC�C, three C�CN, three CNC�, including two bond anglesoutside the C� triangle) are varied with the same 5-degree maxi-mum variation. Including perturbations in the additional threepeptide torsional angles closes 1594 loops. Our full search methodis thus comparable to that of BK in the number of closed loops, andour ability to add the peptide torsion DOFs increases the coverageof the conformational space slightly. The simple three-angle per-turbation closes 819 closed loops, which is twice as many as thoseobtained without perturbation (346), but only half of those ob-tained with full perturbation. The total computation time increasessteeply with increasing perturbation level: 0.26, 0.56, 17.6, and24.0 s for no perturbation, simple 3-angle perturbation, 9-angle,and 12-angle perturbations, respectively, when scaled to an AMD1800� MP processor.

Application to Loop Modeling

Analytical loop closure finds a discrete set of loop conformationsfor a three-residue loop, but a longer loop has a continuous set ofpossible closed loop conformations. Sampling a longer loop there-fore requires a strategy to sample the extra DOFs to be coupledwith the analytical loop closure. The extra DOFs could be sampledeither randomly or in an informed way. When the sampling israndom, unfeasible conformations due to unfavorable �/� anglesor steric clashes would be screened out in later stages of loopprediction algorithms, but it would be more efficient if suchstructures are excluded during the loop sampling stage. We employan existing loop construction algorithm,31 which performs this bysampling in the allowed regions of the �/� map in a discretemanner and screening possible side-chain clashes using a rotamerlibrary. This algorithm (as implemented in the program PLOP31) isused to build the N-terminal and the C-terminal branches exceptfor the three residue gap in the middle of a loop, and the analyticalloop-closure algorithm is used to close the branches.

The performance of our coupled algorithm is compared to therecent work of Canutescu and Dunbrack called CCD (cyclic co-ordinate descent).39 The CCD algorithm is a numerical loop-closure algorithm that is similar in spirit to the “random tweak”method,29 solving first-order equations iteratively, but it is morerobust and efficient. We take the same test set as in Table 2 of ref.39, which consist of 10 loops each with lengths of 4, 8, and 12residues (total of 30 loops) chosen from a set of nonredundantX-ray crystallographic structures. The comparison is summarized

518 Coutsias et al. • Vol. 25, No. 4 • Journal of Computational Chemistry

Page 10: A Kinematic View of Loop Closure - University of New Mexicovageli/papers/loop.pdf · A Kinematic View of Loop Closure EVANGELOS A. COUTSIAS,1,* CHAOK SEOK,2,* MATTHEW P. JACOBSON,2

in Table 1. The average of the best backbone RMSD obtained fromCCD is 0.56, 1.59, and 3.05 Å for 4, 8, and 12 residue loops,respectively, with average computing time per closed loop of 31,37, and 23 ms on an AMD 1800� MP processor. Our coupledalgorithm gives better average minimum RMSDs of 0.40, 1.01,and 2.34 Å, in almost two orders of magnitude less computing time(0.56, 0.68, and 0.72 ms per loop when scaled to the sameprocessor). In addition, the minimum RMSD for individual testloops is better for 25 out of the 30 cases in Table 1. The conditionsunder which we perform the comparison actually disfavor ouralgorithm because we generate fewer loops. With CCD, the loopsare obtained from 5000 trials (thus about 5000 loop candidates,given that the algorithm can close the loops 99.8% of the time).However, with our algorithm, the exact number of loop candidatesis not the control parameter of the algorithm but rather the sam-pling resolution in the �/� map. As the sampling resolution isincreased, the number of loop conformations increases. For thiscomparison, we generate less than 5000 loop candidates for eachtest case, sometimes far less, which disfavors us in the comparison.The number of loop candidates for each test loop is also shown inTable 1.

The coupled algorithm is also compared with the algorithm aspresented in ref. 31, which does not use the analytical loop closureand continues the discrete �/� sampling instead to close the loop,which we call “numerical” closure here. When the same resolu-tions, thus the same sets of conformations for the residues outsideof the three-residue closure segment, are used, the average bestRMSD obtained is 0.29, 1.66, and 3.25 Å with computation timeper loop of 0.73, 1.60, and 106 ms. Except for the short four-residue loops, which are easy both for the numerical and analyticalclosure due to the small number of DOFs, the analytical closuregives better RMSD in orders of magnitude shorter time per loopconformation, especially for the longest 12 residue loops. This isexpected because the analytical closure can close branches moreefficiently for the given sampling resolution for the branches. The

number of conformations generated by the analytical closuremethod (which in essence has infinite sampling resolution) is muchhigher than the number generated by the algorithm without ana-lytical closure. The average number of closed loops with thenumerical closure is 459, 236, and 42 for the 4, 8, and 12 residueloops, respectively, compared to 1181, 2525, and 2924 with theanalytical closure at the fixed branch sampling resolutions. In thetests performed here, in which the maximum number of loopcandidates is held fixed at 5000, this is actually disadvantageousfor the analytical closure, because more loops are generated usingcoarser sampling for the nonclosure residues. When a maximum of5000 loops are generated, the numerical closure thus gives betterRMSD (0.27, 1.04, and 1.89 Å) although with longer computingtime per loop (8.5, 6.1, and 23 ms). This implies that the optimalnumber of closed loops to be sampled is different for the analyticaland numerical closure, and more loop candidates must be sampledwith the analytical closure. Clustering algorithms can be used toremove the redundancies in the candidates before more expensiverefinement and rescoring steps.31

Finally, adding the bond angle perturbations is found not toaffect the best RMSD compared to no perturbation, although moreclosed loops are found. Producing more high-quality loops by abiased perturbation that samples desired regions of �/� would bemore useful for loop modeling.

Application to Loop Optimization Using Monte CarloMinimization

We employ the local moves described earlier as a perturbationstrategy in the Monte Carlo Minimization (MCM) method of Liand Scheraga,40 and apply the method to the global energy mini-mization of a protein loop. MCM is a global optimization methodby which local energy barriers can be overcome with energyminimization of the perturbed structure before a Metropolis crite-rion41 is applied.

Table 1. Minimum RMSD (in Å) of the Candidate Loops with Our Algorithm (CSJD)and the CCD Algorithm.

Four-residue loops Eight-residue loops 12-residue loops

Loop CSJD CCD Loop CSJD CCD Loop CSJD CCD

1dvjA_20 0.38 (4548) 0.61 1cruA_85 0.99 (2516) 1.75 1cruA_358 2.00 (4148) 2.541dysA_47 0.37 (2234) 0.68 1ctqA_144 0.96 (1754) 1.34 1ctqA_26 1.86 (3968) 2.491eguA_404 0.37 (170) 0.68 1d8wA_334 0.37 (1686) 1.51 1d4oA_88 1.60 (1802) 2.331ej0A_74 0.21 (1564) 0.34 1ds1A_20 1.30 (3506) 1.58 1d8wA_46 2.94 (3906) 4.831i0hA_123 0.26 (342) 0.62 1gk8A_122 1.29 (2362) 1.68 1ds1A_282 3.10 (1162) 3.041id0A_405 0.72 (528) 0.67 1i0hA_145 0.36 (1452) 1.35 1dysA_291 3.04 (2306) 2.481qnrA_195 0.39 (1064) 0.49 1ixh_106 2.36 (4448) 1.61 1eguA_508 2.82 (2106) 2.141qopA_44 0.61 (4284) 0.63 1lam_420 0.83 (2200) 1.60 1f74A_11 1.53 (3048) 2.721tca_95 0.28 (418) 0.39 1qopB_14 0.69 (3384) 1.85 1qlwA_31 2.32 (4780) 3.381thfD_121 0.36 (2958) 0.50 3chbD_51 0.96 (1838) 1.66 1qopA_178 2.18 (2014) 4.57Average 0.40 (1181) 0.56 Average 1.01 (2525) 1.59 Average 2.34 (2924) 3.05

CCD results are taken from Table 2 of ref. 39; 5000 trials were performed per test loop with the CCD, so the minimumis among about 5000 closed loops. With CSJD, the number of candidate loops is taken to be always less than 5000 foreach test loop, and shown in the parentheses.

Kinematic View of Loop Closure 519

Page 11: A Kinematic View of Loop Closure - University of New Mexicovageli/papers/loop.pdf · A Kinematic View of Loop Closure EVANGELOS A. COUTSIAS,1,* CHAOK SEOK,2,* MATTHEW P. JACOBSON,2

We now take advantage of the fact that some steric barriers thatare hard to overcome by random moves of individual atoms couldbe bypassed by coordinated moves of multiple atoms. It has beenreported that adding such concerted moves in Metropolis MonteCarlo simulations improves the sampling efficiency.2,3,5–7 Ourlocal moves are more general than those previously applied to MCsimulations, but it is straightforward to apply these moves to MC.The same Jacobian as in refs. 3 and 7 needs to be included tosatisfy the microscopic reversibility. Here we apply the concertedmoves to MCM for the first time, and show significant improve-ment in efficiency in finding the global minimum.

Several other strategies for local moves have been applied tothe loop optimization problem (see references in ref. 42), andamong these, Local Torsional Deformation (LTD)42 has been oneof the most efficient methods of perturbing cyclic or loop struc-tures when combined with MCM. In LTD, torsion angles areperturbed only locally, i.e., no bonds are rotated beyond theperturbed region. In addition, only those perturbations that keepthe bond between the last perturbed atom and the first unperturbedatom within a ring closure range (0.5–3.54 Å) are considered.42 Inour approach, one torsion angle (� or �) in the loop is perturbed,and six other torsion angles (three pairs of � and �) are adjustedto keep the perturbation local. The perturbed conformations thusdo not have any large strains due to unrealistic bond angles or bondlengths. Such a move is compared with LTD below.

Results

It is not intuitively obvious whether using perturbations exactlysatisfying geometrical constraints (Exact Loop Closure, ELC)would be significantly more efficient than approximate perturba-tions like LTD, because a physical energy function can correct forthe inaccurate geometry in the process of energy minimization.Figures 7 and 8 show that the moves based on our ELC actuallygreatly enhance the performance of MCM compared to approxi-

mate LTD in finding low energy conformations. Figure 7 illus-trates that ELC finds the (putative) global minimum about threetimes faster than LTD, and Figure 8 shows that ELC has a muchhigher probability of finding the correct global minimum than LTDwhen the optimization starts with a random initial structure. Thedetails on the simulations are presented in the next section.

If keeping the backbone geometry within physically reasonableranges improves efficiency, the same is expected for side chains.We thus draw the side chain torsion angles from a rotamer library.The rotamer probability distributions in the backbone dependentrotamer library of Dunbrack et al.43 are used to perturb side-chaintorsion angles. This rotamer method is compared with “random”side-chain perturbation method. Employing a rotamer library im-proves the performance: average lowest energies found after 10runs of 1000 MCM iterations with rotamer and random methodsare respectively, �2478.5 and �2477.2 kcal/mol for ELC, and�2473.2 and �2463.7 kcal/mol for LTD for the same initial loopstructure as in Figure 7. Computations in Figures 7 and 8 were bothperformed with the side-chain rotamer library.

Methods

The energy function used is EEF1,44 which is the CHARMM 19polar hydrogen force field with a Gaussian implicit solvationmodel. An eight-residue loop (84–91) in Turkey egg lysozyme(pdb code 1351.pdb) is taken for this study because the (putative)global minimum of this loop is located very close to the crystalstructure (about 0.3 Å). The loop RMSD is measured as theroot-mean-square deviation in the main chain atoms of the loopwhen the three stem residues on both sides of the loop are opti-mally superimposed. The L-BFGS-b algorithm45 by Zhu et al. is

Figure 7. ELC finds the putative global energy minimum more effi-ciently than LTD. Ten MCM runs (1000 iterations, or minimizations,for each) are shown. Each run starts from the same initial conforma-tions, which is 3.8 Å from the crystal structure. The ordinate representsthe lowest energy found up to that iteration number. The (putative)global minimum is at �2478.7 kcal/mol, and is represented by thedotted line.

Figure 8. ELC finds lower energy and rmsd structures than LTDwhen MCM is started from diverse initial structures. The ordinate andabscissa are the lowest energy found and RMSD corresponding to thestructure. The initial structures are generated by randomly choosing�/� angles of the loop residues. One thousand iterations were per-formed for each of 30 random initial structures. Only 23 points areshown for LTD because seven other points have much higher energyand rmsd. The (putative) global minimum is represented by the backcircle.

520 Coutsias et al. • Vol. 25, No. 4 • Journal of Computational Chemistry

Page 12: A Kinematic View of Loop Closure - University of New Mexicovageli/papers/loop.pdf · A Kinematic View of Loop Closure EVANGELOS A. COUTSIAS,1,* CHAOK SEOK,2,* MATTHEW P. JACOBSON,2

used for energy minimization with the gradient tolerance of 1kcal/mol Å.

The details of modeling and parameters for both ELC and LTDare as follows. Hydrogen atoms are modeled on the crystal struc-ture and the structure is energy minimized to remove bad contactswith harmonic constrains on heavy atoms with the force constant5 kcal/mol. The resulting structure is 0.1 Å from the crystalstructure. All other atoms are fixed except for the loop atoms inMCM. The temperature parameter kT for MCM is set to 1 kcal/mol for both ELC and LTD. In ELC, we choose three residues inthe loop randomly whose �/� angles compensate for the change ofa driver torsion angle, which is also chosen randomly within thetriangle formed by the three residues. The driver angle � ischanged with uniform probability in the range [� � f1, � �f1]. f1 � 0.7 was found to be optimal. Out of the multiple loopclosure solutions, the closest solution to the current structure inRMSD is selected for the perturbation step in MCM. In LTD, fourconsecutive �/� angles are perturbed with uniform probability inthe range [� � f2, � � f2], where f2 � 0.8 was found to beoptimal. We also tried 3–4 one or two consecutive angle move-ments following ref. 42, but they were less efficient. However, ithas to be mentioned that comparisons of algorithms is not alwaysstraightforward, with a lot of parameters and technical details thatcan be varied. For both LTD and ELC, each side chain is perturbedindependently with the probability 1/8, and the side-chain rotameris selected from the backbone-dependent rotamer library with thebackbone-dependent probability. The random side-chain perturba-tion method perturbs each side-chain torsion angle independentlywith the probability 1/8, and the angle value is drawn from auniform range around the current value [� � f, � � f], wheref � f1 for ELC and f � f2 for LTD.

Conclusions

The bonded near-neighbor forces in a protein can be grouped intoroughly three categories with respect to strength: hard forcesassociated with bond lengths, intermediate forces associated withbond angles, and soft forces associated with the �/� dihedrals andside-chain angles. The forces associated with the � dihedral of thepeptide bond can be placed in the intermediate range. In consid-ering a polypeptide chain it is tempting to concentrate on motionsassociated with the “soft” DOFs, i.e., those associated with �/�.The other DOFs can be assumed to vary to a limited degree,although they can be fixed to arbitrary values as far as the geo-metric analysis is concerned.

The conformational space of a tripeptide unit (excluding N1

and C�3) can be seen as the Cartesian product of two circles, i.e.,a torus. Exploring the conformation space of this simple system isstraightforward in terms of the �/� dihedrals. However, adding aconstraint such as fixing the distance between C�1 and C�3 intro-duces a relationship between the two dihedrals, eq. (11), and thisinterdependence (“Rotation Transfer Function” or RTF) forms thebasis of the analytical approach followed in this article. In general,fixed-distance constraints, whether resulting from NMR measure-ments for structure determination or as part of a strategy forexploring conformation space, imply sets of such transfer func-tions among angle DOFs and, when combined with the other

“almost-rigid” constraints in a macromolecule, lead to a reductionin the dimensionality of the space of allowed motions. Allowingfor small variability of additional DOFs provides a search algo-rithm with more room to maneuver, replacing barriers by narrow,passable corridors. Such constraint-compatible conformationsform the natural low-energy terrain that needs to be exploredthoroughly. Choosing coordinates that describe this terrain effi-ciently, is an essential part of this exploration, because the reduc-tion in dimension comes at the price of complicated topology.Clearly, these ideas need not be limited to backbone motions, andextending them to side chains is essential if NMR derived distanceconstraints are to be included. In that regard we view the tripeptideloop closure and the variants presented here as one of manypossible applications of the distance-angle relationship expressedby the basic RTF.

Several generalizations of the method presented here are pos-sible. For example, the transfer functions are Fourier polynomialsin the other angle variables as well. A reduction of these topolynomial form would lead to a new polynomial system, now inseveral additional variables. The zero sets become higher dimen-sional objects, and new methods can be brought to bear for findingclosure solutions. We chose to treat these variables by simplesearch methods here, but a more complete search would be re-quired if, for example, some energy criterion is included in theperturbation process.

The main advantage of a �/� search method is that it avoidssearching conformations that introduce distortions of the hardDOFs. The benefit of the method in reducing the size of the searchspace is not seriously affected if small variations in additionalDOFs are allowed. Thin slivers of configurations replace the �/�hypersurfaces, so that the volume of the allowed space is stilldramatically reduced. To take full advantage of the intimate con-nection between the concerted moves idea and the true kinematicDOFs of the chain, a strategy of choosing moves should beinformed about the effect of these moves vis. steric clashing withthe rest of the chain as well as side chain placement. A possibleextension could be the incorporation of obstacle avoidance andother similar ideas from robotic motion planning.

Acknowledgments

One of the authors (E.A.C.) would like to acknowledge the hos-pitality of the Dill group during several visits to UCSF. E.A.C. alsoacknowledges many useful conversations with John Chodera. Wethank Dr. Bosco Ho for pointing out ref. 39, and Dr. RobertBruccoleri for providing parameters for the cyclopentapeptideexample.

Appendix A: Two-Cone Systems and theRotation Transfer Function

In the body frame of the three fixed C�i atoms, the C�iNi unitvector ri�1

and the C�iCi unit vector ri� lie on cones about the

C�i�1C�i and C�iC�i�1 virtual bonds, respectively, assumingfixed bond lengths, angles, and dihedral � [see Fig. 9a]. The i�1

Kinematic View of Loop Closure 521

Page 13: A Kinematic View of Loop Closure - University of New Mexicovageli/papers/loop.pdf · A Kinematic View of Loop Closure EVANGELOS A. COUTSIAS,1,* CHAOK SEOK,2,* MATTHEW P. JACOBSON,2

and �i rotations are not independent because the bond angleNiC�iCi must be fixed (or in a limited range in general). If we canthink of the cones of possible locations of the bonds about theircorresponding virtual axes, then we must think of the angle con-straint between the two bonds as a fixed distance condition be-tween two generatrices of these cones as shown in Figure 9a.Clearly, to each position of one bond there can be at most twopositions for the other. Because the four angles �, �, �, � areconstant, the alternative positions describe the possible conforma-tions of the tetrahedral formed by these four angles in Figure 9b.

The ranges of these positions can change character and transitfrom both having one connected component (Fig. 10a) to whereone of them splits to two disjoint arms (Fig. 10b).

At each two-cone, the i�1 and �i rotations are related by the�i bond angle constraint

ri� � ri�1

� cos �, (23)

which results in eq. (8) rewritten omitting subscripts for simplicity:

cos � � cos � cos � cos �

� sin ��sin � cos � cos � cos � sin � cos ��

� sin � sin ��sin � sin � cos � cos � cos �. (24)

To see the � � relation more explicitly, � :� �i is solved forgiven :� i�1, or given �. Arranging eq. (24) as atcos � �btsin � � ct gives

� � �o � arccos�ct/�at2 � bt

2�, (25)

where

at � cos � sin � sin � cos � sin � cos � sin �,

bt � sin � sin � sin ,

ct � cos � � cos � cos � cos � � sin � sin � cos � cos ,

cos �o � at/�at2 � bt

2, sin �o � bt/�at2 � bt

2. (26)

Given �, is expressed as

� o � arccos�cs/�as2 � bs

2�, (27)

Figure 9. (a) The two cones at a double rotatable bond junction. Theblack circle denotes the C� atom, the black oval on the �-cone the endof the C�C unit vector, and that on the -cone the end of C�N unitvector. The dashed lines are the virtual bonds between the C� atoms,and the line between the ovals shows the fixed distance constraint. (b)The angles at a double rotatable bond junction.

Figure 10. (a) A type of two cones (type IIb) in which both C (on the�-cone) and N (on the -cone) trace connected segments. The blackcircles denote extreme positions of the N and C atoms. The whitecircle is the C position corresponding to the N at the black circleconnected to it by a line. (b) A two-cone type (type IIIb) in which theC atom (on the �-cone) traces two disjoint segments, and N (on the-cone) traces the whole circle.

522 Coutsias et al. • Vol. 25, No. 4 • Journal of Computational Chemistry

Page 14: A Kinematic View of Loop Closure - University of New Mexicovageli/papers/loop.pdf · A Kinematic View of Loop Closure EVANGELOS A. COUTSIAS,1,* CHAOK SEOK,2,* MATTHEW P. JACOBSON,2

where

as � cos � sin � sin � cos � � sin � sin � cos �,

bs � sin � sin � sin �,

cs � cos � � cos � cos � cos � � sin � cos � sin � cos �,

cos o � as/�as2 � bs

2, sin o � bs/�as2 � bs

2. (28)

Equation (25) has two solutions if �ct/�at2 � bt

2� � 1, oneif � 1, and none if 1. The range of in which solutions existis determined by 1 and 2 which are two roots of ct

2 � at2 � bt

2,and likewise for �. The roots can also be found by noting thefollowing geometrical relations:

ri�1 � zi � cos�� � ��, ri

� � zi�1 � �cos�� � ��, (29)

which give

cos�� � �� � sin sin � sin � � cos � cos �, (30)

cos�� � �� � sin � sin � sin � � cos � cos �. (31)

Equations (30) and (31) have roots if the following conditions aresatisfied:

�cos�� � �� � cos�� � ����cos�� � �� � cos�� � ��� � 0, (32)

�cos�� � �� � cos�� � ����cos�� � �� � cos�� � ��� � 0. (33)

We call the roots of Equations (30) and (31) � and ��. Whenthere exist roots, there are six two-cone types, depending on thesign (�) of each root.

I. No solutionIIa. �, ��

IIb. �, ��

IIIa. �, �

IIIb. ��, ��

IVa. �, ��

IVb. �, ��

It can be seen that, in the case of IIa, increasing � increases theallowable range of /�, and decreasing � increases the range for IIb(see Fig. 10a). This fact is used in the simple perturbation methoddescribed earlier. For other two-cone types, the � values areincreased in the perturbation algorithm.

The i�1 � �i relationship is more complicated than the i–�i

relationship (i � �i � i), but more detailed understanding ofthe –� relationship at the junction of the two rotatable bonds isuseful. First, it makes it possible to predict the effect of bond angleperturbations, as described earlier. Second, it also reveals thegeometrical restriction on the side-chain location, especially C�,due to the correlated movement of Ni, C�i, and Ci atoms asdescribed by correlation of the i�1 and �i rotations. To demon-strate this, in Figure 10, we show the � � relationship and thecorresponding C� positions obtained from the canonical tripeptide

geometry, and compare with those extracted from the structuredatabase Top500.36 Figures 11a and b shows possible ranges for–� and C� when � is fixed at 90° (this value of � has themaximum density in the � distribution in the database). Thedatabase points are for � � 90 � 0.5°. Clearly, the theoretical –�curve computed from the canonical bond angles shows excellentagreement with the database points, the large majority of structuresclustering about the Ramachandran allowable portion of the –�curve (Fig. 11a). Reconstructing C� using canonical angles alsoshows close agreement (Fig. 11b). This is the unimodal case (alsoillustrated in Fig. 10a), and it is the most commonly occurringconfiguration in the database. Figures 11c and d shows an exampleof a bimodal case (also illustrated in Fig. 10b), for � � 111°,which is close to the second density maximum in the distributionof �. Here we see a small discrepancy, indicating that the structureis stressed, i.e., some of the parameters are off their typical values.The stress turns out to be even stronger if one compares the C�

distribution in the database to that reconstructed assuming typicalangle values. We are studying these properties further, togetherwith their possible application to side-chain optimization, espe-cially when backbone flexibility is also taken into account.

Appendix B: Coefficients of the Polynomials

Equation (8) is written as a double Fourier series

0 � ai � bicos i�1 � cicos �i � dicos i�1cos �i

� eisin i�1sin �i, (34)

where the coefficients are

ai � �cos �i � cos �icos �i�1cos �i

bi � sin �isin �i�1cos �i

ci � sin �icos �i�1sin �i

di � cos �isin �i�1sin �i

ei � sin �i�1sin �i.

Now introduce the half-angle formulas Eqs. (9) and (10) into(34) to arrive at a system of three biquadratics in wi, ui, i � 1, 2,3,

0 � ai � bi

1 � wi�12

1 � wi�12 � ci

1 � ui2

1 � ui2 � di

1 � wi�12

1 � wi�12

1 � ui2

1 � ui2

� ei

2wi�1

1 � wi�12

2ui

1 � ui2 ,

or equivalently:

0 � ai�1 � wi�12 ��1 � ui

2� � bi�1 � wi�12 ��1 � ui

2�

� ci�1 � wi�12 ��1 � ui

2� � di�1 � wi�12 ��1 � ui

2� � ei4wi�1ui,

Kinematic View of Loop Closure 523

Page 15: A Kinematic View of Loop Closure - University of New Mexicovageli/papers/loop.pdf · A Kinematic View of Loop Closure EVANGELOS A. COUTSIAS,1,* CHAOK SEOK,2,* MATTHEW P. JACOBSON,2

Expanding and regrouping results in eq. (11):

Aiwi�12 ui

2 � Biwi�12 � Ciwi�1ui � Diui

2 � Ei � 0 (35)

where

Ai � ai � bi � ci � di � �cos �i � cos��i � �i�1 � �i�

Bi � ai � bi � ci � di � �cos �i � cos��i � �i�1 � �i�

Ci � ei � 4 sin �i�1sin �i

Di � ai � bi � ci � di � �cos �i � cos��i � �i�1 � �i�

Ei � ai � bi � ci � di � �cos �i � cos��i � �i�1 � �i�.

We now eliminate the variables wi: using the twist transformation,eq. (12),

wi �ui � i

1 � iui, i � tan i/2,

in eq. (35) we find

Ai� ui�1 � i�1

1 � i�1ui�1� 2

ui2 � Bi� ui�1 � i�1

1 � i�1ui�1� 2

� Ci

ui�1 � i�1

1 � i�1ui�1ui

� Diui2 � Ei � 0 (36)

Finally, the derivation of the coupled biquadratic polynomials eqs.(13), (14), and (15), is carried out by multiplying through by (1 �i�1ui�1)2 and regrouping. Because

�sin

1 � cos , 2 �

1 � cos

1 � cos ,

Figure 11. (a) The –� relationship given by the Rotation Transfer Function for � � 90° and typical values of � � 111.6°, � � 16.63°, � � 19.13°,plotted together with –� values in the database. (b) Plot of the location of C� as computed from the –� values in the theoretical curve of (a), againstC� positions from the database, both shown in spherical coordinates. (c) Same as (a), but for � � 111°. Two more curves with � perturbation of5° and �5° are shown together, which improve fit to the database points. The case is bimodal, although its character shifts as � is changed. (d) TheC� plots corresponding to the situation in (c).

524 Coutsias et al. • Vol. 25, No. 4 • Journal of Computational Chemistry

Page 16: A Kinematic View of Loop Closure - University of New Mexicovageli/papers/loop.pdf · A Kinematic View of Loop Closure EVANGELOS A. COUTSIAS,1,* CHAOK SEOK,2,* MATTHEW P. JACOBSON,2

we multiply the resulting expressions through by (1 � cos i�1)/ 2to arrive at the expression for the coefficients:

p22�i� � �cos �i � cos �i�1cos��i � �i�

� cos i�1sin �i�1sin��i � �i�

p21�i� � �2 sin i�1sin �i�1sin �i

p20�i� � �cos �i � cos �i�1cos��i � �i�

� cos i�1sin �i�1sin��i � �i�

p12�i� � �2 sin i�1sin �i�1sin��i � �i�

p11�i� � 4 cos i�1sin �i�1sin �i

p10�i� � �2 sin i�1sin �i�1sin��i � �i�

p02�i� � �cos �i � cos �i�1cos��i � �i�

� cos i�1sin �i�1sin��i � �i�

p01�i� � 2 sin i�1sin �i�1sin �i

p00�i� � �cos �i � cos �i�1cos��i � �i�

� cos i�1sin �i�1sin��i � �i�.

Equations (13), (14), and (15) are now rewritten as

P1�u3, u1� � �j�0

2 � �k�0

2

pjk�1�u3

j�u1k � �

j�0

2

Lju1j ,

P2�u1, u2� � �k�0

2 � �j�0

2

pjk�2�u2

k�u1j � �

k�0

2

Mku1k,

and

P3�u2, u3� � �j�0

2 � �k�0

2

pjk�3�u3

k�u2j � �

j�0

2

Nju2j ,

where

Lj :� Lj�u3� :� �k�0

2

pjk�1�u3

j , Mk :� Mk�u2� :� �j�0

2

pjk�2�u2

k,

and

Nj :� Nj�u3� :� �k�0

2

pjk�3�u3

k.

The resultant of P1 and P2, whose vanishing guarantees acommon root in u1, is given by the determinant

R8�u2, u3� � �L2 L1 L0 00 L2 L1 L0

M2 M1 M0 00 M2 M1 M0

�� L2 L0

M2 M0 2

— L2 L1

M2 M1 L1 L0

M1 M0

Because all the nonvanishing elements are products of two qua-dratics in u2 and two quadratics in u3, the resultant is a biquarticin these variables, and has the form

R8�u2, u3� � �j,k�0

4

qjku2j u3

k.

Here, the 5 � 5 � 25 quantities qjk are found in terms of productsof the ajk :� pjk

(1) and bjk :� pjk(2) by expressing as a sum of six

tensor products:We write R8 as a quartic in u2 introducing the functions Qj,

quartics in u3:

R8 � �j�0

4 � �k�0

4

qjku3k�u2

j �: �j�0

4

Qju2j .

The final resultant, which eliminates u2 to arrive at a degree 16polynomial in u3 is given by:

R16 � det�S�

where the matrix S is given as:

S�u3� :� �k�0

4

Sku3k � �

N2 N1 N0 0 0 00 N2 N1 N0 0 00 0 N2 N1 N0 00 0 0 N2 N1 N0

Q4 Q3 Q2 Q1 Q0 00 Q4 Q3 Q2 Q1 Q0

� (37)

so that

Sk :� �c2k c1k c0k 0 0 00 c2k c1k c0k 0 00 0 c2k c1k c0k 00 0 0 c2k c1k c0k

q4k q3k q2k q1k q0k 00 q4k q3k q2k q1k q0k

�(where we defined cij :� pij

(3), with ci3 � ci4 � 0, i � 0, 1, 2).These matrices can be used directly in the matrix polynomialapproach, which finds the solutions as eigenvalues of a “compan-ion” matrix pencil. The computation of the polynomial coefficientsfor the direct approach requires some additional computationsdescribed below. We proceed by a Laplace expansion46 of eq. (37)by complementary minors of order 3. First, we rearrange the rowsof S:

Kinematic View of Loop Closure 525

Page 17: A Kinematic View of Loop Closure - University of New Mexicovageli/papers/loop.pdf · A Kinematic View of Loop Closure EVANGELOS A. COUTSIAS,1,* CHAOK SEOK,2,* MATTHEW P. JACOBSON,2

det S � �N2 N1 N0 0 0 00 N2 N1 N0 0 00 0 N2 N1 N0 00 0 0 N2 N1 N0

Q4 Q3 Q2 Q1 Q0 00 Q4 Q3 Q2 Q1 Q0

�� ��

N2 N1 N0 0 0 0Q4 Q3 Q2 Q1 Q0 00 0 N2 N1 N0 00 N2 N1 N0 0 00 Q4 Q3 Q2 Q1 Q0

0 0 0 N2 N1 N0

�� � �

1�i1�i2i3�6

��1�i1�i2�i3det S�1, 2, 3; i1, i2, i3�

� det S�4, 5, 6; i4, i5, i6�

where S(1, 2, 3; i1, i2, i3) is the 3 � 3 submatrix of S formed byelements in rows 1, 2, 3 and columns i1 � i2 � i3. Also, i4 �i5 � i6 and i4, i5, i6 differ from i1, i2, i3. We introduce the 3 �5 submatrix

P :� �N2 N1 N0 0 0Q4 Q3 Q2 Q1 Q0

0 0 N2 N1 N0

�and the 3 � 3 minors T(i, j, k) formed by the columns i, j, andk of P. Then, the Laplace expansion of S in terms of the minorsbased on rows 1, 2, and 3, and their complements from rows 4, 5,and 6,46 can be written compactly in the form:

det S � T�1, 2, 3�T�3, 4, 5� � T�1, 2, 4�T�2, 4, 5�

� T�1, 2, 5�T�2, 3, 5� � T�1, 3, 4�T�1, 4, 5�

� T�1, 3, 5�T�1, 3, 5� � T�1, 4, 5�T�1, 2, 5�

The computation of the resultant proceeds with the nine quantitiesT(i, j, k) above. Because they are sums of products of terms of theform N�N�Q� they are polynomials in u1 of degree 8. We list theexpressions for these below in terms of the Ni, Qj involved. Oncethe Ts have been computed, we need to compute the productsabove, i.e., we need to compute 6 binary products of polynomialsof degree 8. A certain amount of factoring can be utilized to furtherreduce the operational count of this procedure.

We give now the T(i, j, k):

T�1, 2, 3� � N2 N1 N0

Q4 Q3 Q2

0 0 N2

� N2N2 N1

Q4 Q3

T�1, 2, 4� � N2 N1 0Q4 Q3 Q1

0 0 N1

� N1N2 N1

Q4 Q3

T�1, 2, 5� � N2 N1 0Q4 Q3 Q0

0 0 N0

� N0N2 N1

Q4 Q3

T�1, 3, 4� � N2 N0 0Q4 Q2 Q1

0 N2 N1

� �N2Q2 Q1

N2 N1� Q4N0N1

T�1, 3, 5� � N2 N0 0Q4 Q2 Q0

0 N2 N0

� �N2N2 N0

Q2 Q0� Q4N0

2

T�1, 4, 5� � N2 0 0Q4 Q1 Q0

0 N1 N0

� �N2N1 N0

Q1 Q0

T�2, 3, 5� � N1 N0 0Q3 Q2 Q0

0 N2 N0

� �N1N2 N0

Q2 Q0� N0

2Q3

T�2, 4, 5� � N1 0 0Q3 Q1 Q0

0 N1 N0

� �N1N1 N0

Q1 Q0

T�3, 4, 5� � N0 0 0Q2 Q1 Q0

N2 N1 N0

� �N0N1 N0

Q1 Q0

From these expressions, whose computation involves only 4 dis-tinct 2 � 2 determinants, we can compute the final polynomial.This computation can be done analytically, by deriving the lengthyexpressions for the coefficients of the final polynomial in terms ofthe coefficients of the original polynomials. These analytical ex-pressions can be useful, especially if one wants to study the effectof varying parameters on the behavior of the solution of thetripeptide loop closure. For the calculations reported in this article,the computation of the coefficients was done numerically. In thiscase, it is optimal to compute the 8th degree polynomials associ-ated with each of the T(i, j, k) and then compute the six polyno-mial products (which can be easily reduced to five polynomialmultiplications with appropriate factorizations).

Appendix C: Systems of Polynomials andResultants

The resultant of a system of polynomials in several variables is anecessary and sufficient condition for the existence of a commonsolution. For two polynomials, Fm(u) and Fn(u) of degrees m andn, to have a common solution u they must have a factor incommon, i.e., there must exist polynomials g(u) and h(u) ofdegrees � n � 1 and � m � 1, respectively, such that

gFm � hFn � 0.

This leads to a system of m � n linear homogeneous equations fordetermining the coefficients of g and h, and the resultant is thedeterminant of the matrix associated with that system. We dem-onstrate how this works for two second order equations in a singlevariable. Let

f1�u� � a2u2 � a1u � a0 � 0

526 Coutsias et al. • Vol. 25, No. 4 • Journal of Computational Chemistry

Page 18: A Kinematic View of Loop Closure - University of New Mexicovageli/papers/loop.pdf · A Kinematic View of Loop Closure EVANGELOS A. COUTSIAS,1,* CHAOK SEOK,2,* MATTHEW P. JACOBSON,2

f2�u� � b2u2 � b1u � b0 � 0.

If these have a common root, say u*, they must be of the form

f1�u� � a2�u � u*��u � u1� � 0

f2�u� � b2�u � u*��u � u2� � 0

so that there exist two polynomials of degree 1, g( x) � b2(u �u2) and h( x) � �a2(u � u1) such that

g�u� f1�u� � h�u� f2�u� � 0. (38)

Because the roots are assumed unknown, we simply write

g�u� � g1u � g0, h�u� � h1u � h0

and eq. (38) becomes

� g1u � g0��a2u2 � a1u � a0�

� �h1u � h0��b2u2 � b1u � b0� � 0

or, grouping like powers of u together

� g1a2 � h1b2�u3 � � g1a1 � g0a2 � h1b1 � h0b2�u2

� � g0a1 � g1a0 � h0b1 � h1b0�u � � g0a0 � h0b0� � 0

which can be written in the equivalent form

� g1 g0 h1 h0��a2 a1 a0 00 a2 a1 a0

b2 b1 b0 00 b2 b1 b0

��u3

u2

u1� � 0

so that the left and right null vectors give, respectively, the coef-ficients of the two factor polynomials and the (common) zero ofthe original pair. The rank deficiency of the coefficient matrix (andthe vanishing of its determinant, i.e., the resultant) is the necessaryand sufficient condition for the existence of these null vectors.

Once the vanishing of the determinant above has been estab-lished, finding u is straightforward; discarding the third equationimplied above for the right null-vector (because it is dependent onthe others), and moving the column associated with the component1 to the right-hand side, we solve the resulting system for u usingCramer’s rule:

u �

a2 a1 00 a2 �a0

0 b2 �b0

a2 a1 a0

0 a2 a1

0 b2 b1

The above technique is applied to eqs. (18) and (19) to give u2

and u1, once u3 is obtained:

u2 �

�N2 N1 N0 0 00 N2 N1 N0 00 0 N2 N1 00 0 0 N2 �N0

0 Q4 Q3 Q2 �Q0

��N2 N1 N0 0 00 N2 N1 N0 00 0 N2 N1 N0

0 0 0 N2 N1

0 Q4 Q3 Q2 Q1

�,

where Nj and Qj are functions of u3 as described in Appendix B,and

u1 �

L2 L1 00 L2 �L0

0 M2 �M0

L2 L1 L0

0 L2 L1

0 M2 M1

,

where Lj and Mj are functions of u3 and u2, respectively, alsogiven in Appendix B.

References

1. Tramontano, A.; Leplae, R.; Morea, V. Proteins 2001, 45, 22.2. Go� , N.; Scheraga, H. A. Macromolecules 1978, 11, 552.3. Dodd, L. R.; Boone, T. D.; Theodorou, D. N. Mol Phys 1993, 78, 961.4. Wu, M. G.; Deem, M. W. Molecular Phys, 1999, 97, 559.5. Wakana, H.; Wako, H.; Saito, N. Int J Pept Protein Res 1984, 23, 315.6. Knapp, E.-W.; Irgens-Defregger, A. J Comput Chem 1992, 14, 19.7. Dinner, A. R. J Comput Chem 2000, 21, 1132.8. Elofsson, A.; Le Grand, S. M.; Eisenberg, D. Proteins 1995, 13, 73.9. Ulyanov, N. B.; Schmitz, U.; James, T. L. J Biomolecular NMR 1993,

3, 547.10. Cahill, S.; Cahill, M.; Cahill, K. J Comp Chem 2003, 24, 1364.11. Ulmschneider, J. P.; Jorgensen, W. L. J Chem Phys 2003, 118, 4261.12. Wedemeyer, W. J.; Scheraga, H. A. J Comput Chem 1999, 20, 819.13. Go� , N.; Scheraga, H. A. Macromolecules 1970, 3, 178.14. Hartenberg, R. S.; Denavit, J. Kinematic Synthesis of Linkages;

McGraw-Hill: New York, 1964.15. Henrici, P. Applied and Computational Complex Analysis; Prentice-

Hall: New York, 1974.16. Sturmfels, B. In Applications of Computational Algebraic Geometry,

Proceedings of Symposia in Applied Mathematics; Cox, D. A.; Sturm-fels, B., Eds.; Am Math Soc 1997, p. 25.

17. Petitjean, S. J Math Imaging Vision 1999, 10, 1.18. Wampler, C.; Morgan, A. Mech Mach Theory 1991, 26, 91.19. Duffy, J. Analysis of Mechanisms and Robot Manipulators; Arnold:

London, 1980.20. Hunt, K. H. Kinematic Geometry of Mechanisms; Oxford: Oxford

Univ Press, 1990.21. Manocha, D. In Numerical Methods for Solving Polynomial Equa-

tions, Proceedings of Symposia in Applied Mathematics, Cox, D. A.;Sturmfels, B., Eds.; Am Math Soc 1997, p. 41.

22. Freudenstein, G. Mech Mach Theory 1973, 8, 151.23. Lee, H.-Y.; Liang, C.-G. Mech Mach Theory 1988, 23, 219.24. Bruccoleri, R. E.; Karplus, M. Macromolecules 1985, 18, 2767.

Kinematic View of Loop Closure 527

Page 19: A Kinematic View of Loop Closure - University of New Mexicovageli/papers/loop.pdf · A Kinematic View of Loop Closure EVANGELOS A. COUTSIAS,1,* CHAOK SEOK,2,* MATTHEW P. JACOBSON,2

25. Bruccoleri, R. E.; Karplus, M. Biopolymers 1987, 26, 137.26. Lee, H.-Y.; Liang, C.-G. Mech Mach Theory 1988, 23, 209.27. Vermeer, P. Proceedings of the 13th ACM Symposium on Computa-

tional Geometry; Nice: France, June 1997.28. Manseur, R.; Doty, K. L. Int J Robotics Res 1989, 8, 75.29. Fine, R. M.; Wang, H.; Shenkin, P. S.; Yarmush, D. L.; Levinthal, C.

Proteins 1986, 1, 342.30. Zheng, Q.; Rosenfeld, R.; Vajda, S.; DeLisi, C. Protein Sci 1993, 2,

1242.31. Jacobson, M. P.; Pincus, D. L.; Rapp, C. S.; Day, T. J. F.; Honig, B.;

Shaw, D. E.; Friesner, R. A. Proteins, accepted.32. Gelfand, I. M.; Kapranov, M. M.; Zelevinsky, A. V. Discriminants,

Resultants and Multidimensional Determinants; Birkhauser: Boston,1994.

33. Bricard, R. J Math Pures Appl 1897, 3, 113.34. Bennet, G. T. Proc Lond Math Soc 1912, 10, 309, 2nd Series.35. Bernshtein, D. N. Functional Anal Appl 1975, 9, 183.

36. http://kinemage.biochem.duke.edu/databases/top500.php.37. Hook, D. G.; McAree, P. R. Using Sturm Sequences to Bracket Real

Roots of Polynomial Equations from “Graphics Gems;” AcademicPress: New York, 1990. http://www.acm.org/pubs/tog/GraphicsGems/gems/Sturm/.

38. Go� , N.; Scheraga, H. A. Macromolecules 1970, 3, 188.39. Canutescu, A. A.; Dunbrack, R. L. Protein Sci 2003, 12, 963.40. Li, Z.; Scheraga, H. A. Proc Natl Acad Sci USA 1987, 84, 6611.41. Metropolis, N.; Rosenbluth, A. W.; Rosenbluth, M. N.; Teller, A. H.;

Teller, E. J Chem Phys 1953, 21, 1087.42. Baysal, C.; Meirovitch, H. J Phys Chem A 1997, 101, 2185.43. http://www.fccc.edu/research/labs/dunbrack/bbdep.html.44. Lazaridis, T.; Karplus, M. Proteins 1999, 35, 133.45. Zhu, C.; Byrd, R. H.; Lu, P.; Nocedal, J. L-BFGS-B; Northwestern

Univ: Evanston, IL, 1996.46. Iyanaga, S.; Kawada, Y., Eds. Encyclopedia Dictionary of Mathemat-

ics; MIT Press: Cambridge, MA, 1980, p. 349.

528 Coutsias et al. • Vol. 25, No. 4 • Journal of Computational Chemistry