7
Conformational space annealing and an off-lattice frustrated model protein Seung-Yeon Kim, Sung Jong Lee, and Jooyoung Lee Citation: The Journal of Chemical Physics 119, 10274 (2003); doi: 10.1063/1.1616917 View online: http://dx.doi.org/10.1063/1.1616917 View Table of Contents: http://scitation.aip.org/content/aip/journal/jcp/119/19?ver=pdfcov Published by the AIP Publishing Articles you may be interested in An off-lattice frustrated model protein with a six-stranded β -barrel structure J. Chem. Phys. 133, 135102 (2010); 10.1063/1.3494038 Effect of salt bridges on the energy landscape of a model protein J. Chem. Phys. 121, 10284 (2004); 10.1063/1.1810471 Freezing and folding behavior in simple off-lattice heteropolymers J. Chem. Phys. 120, 11285 (2004); 10.1063/1.1740750 Annealing contour Monte Carlo algorithm for structure optimization in an off-lattice protein model J. Chem. Phys. 120, 6756 (2004); 10.1063/1.1665529 Designability of a protein chain in an off-lattice heteropolymer model J. Chem. Phys. 113, 4827 (2000); 10.1063/1.1287177 This article is copyrighted as indicated in the article. Reuse of AIP content is subject to the terms at: http://scitation.aip.org/termsconditions. Downloaded to IP: 128.143.23.241 On: Fri, 19 Dec 2014 17:53:40

Conformational space annealing and an off-lattice frustrated model protein

Embed Size (px)

Citation preview

Page 1: Conformational space annealing and an off-lattice frustrated model protein

Conformational space annealing and an off-lattice frustrated model proteinSeung-Yeon Kim, Sung Jong Lee, and Jooyoung Lee Citation: The Journal of Chemical Physics 119, 10274 (2003); doi: 10.1063/1.1616917 View online: http://dx.doi.org/10.1063/1.1616917 View Table of Contents: http://scitation.aip.org/content/aip/journal/jcp/119/19?ver=pdfcov Published by the AIP Publishing Articles you may be interested in An off-lattice frustrated model protein with a six-stranded β -barrel structure J. Chem. Phys. 133, 135102 (2010); 10.1063/1.3494038 Effect of salt bridges on the energy landscape of a model protein J. Chem. Phys. 121, 10284 (2004); 10.1063/1.1810471 Freezing and folding behavior in simple off-lattice heteropolymers J. Chem. Phys. 120, 11285 (2004); 10.1063/1.1740750 Annealing contour Monte Carlo algorithm for structure optimization in an off-lattice protein model J. Chem. Phys. 120, 6756 (2004); 10.1063/1.1665529 Designability of a protein chain in an off-lattice heteropolymer model J. Chem. Phys. 113, 4827 (2000); 10.1063/1.1287177

This article is copyrighted as indicated in the article. Reuse of AIP content is subject to the terms at: http://scitation.aip.org/termsconditions. Downloaded to IP:

128.143.23.241 On: Fri, 19 Dec 2014 17:53:40

Page 2: Conformational space annealing and an off-lattice frustrated model protein

Conformational space annealing and an off-lattice frustrated model proteinSeung-Yeon KimSchool of Computational Sciences, Korea Institute for Advanced Study, 207-43 Cheongryangri-dong,Dongdaemun-gu, Seoul 130-722, Korea

Sung Jong LeeDepartment of Physics and Center for Smart Bio-Materials, The University of Suwon, Hwasung-si,Kyunggi-do 445-743, Korea

Jooyoung Leea)

School of Computational Sciences, Korea Institute for Advanced Study, 207-43 Cheongryangri-dong,Dongdaemun-gu, Seoul 130-722, Korea

~Received 24 June 2003; accepted 18 August 2003!

A global optimization method, conformational space annealing~CSA!, is applied to study a46-residue protein with the sequenceB9N3(LB)4N3B9N3(LB)5L, whereB, L, andN designatehydrophobic, hydrophilic, and neutral residues, respectively. The 46-residueBLN protein is foldedinto the native state of a four-strandedb barrel. It has been a challenging problem to locate theglobal minimum of the 46-residueBLN protein since the system is highly frustrated andconsequently its energy landscape is quite rugged. The CSA successfully located the globalminimum of the 46-mer for all 100 independent runs. The CPU time for CSA is about seventy timesless than that for simulated annealing~SA!, and its success rate~100%! to find the global minimumis about eleven times higher. The amount of computational effort used for CSA is also about tentimes less than that of the best global optimization method yet applied to the 46-residueBLNprotein, the quantum thermal annealing with renormalization. The 100 separate CSA runs producethe global minimum 100 times as well as the other 5950 final conformations corresponding to a totalof 2361 distinct local minima of the protein. Most of the final conformations have relatively smallroot-mean-square deviation values from the global minimum, independent of their diverse energyvalues. Very close to the global minimum, there exist quasi-global-minima which are frequentlyobtained as one of the final answers from SA runs. We find that there exist two largest energy gapsbetween the quasi-global-minima and the other local minima. Once a SA run is trapped in one ofthese quasi-global-minima, it cannot be folded into the global minimum before crossing over thetwo large energy barriers, clearly demonstrating the reason for the poor success rate of SA. ©2003American Institute of Physics.@DOI: 10.1063/1.1616917#

I. INTRODUCTION

Finding the global minimum of a given function, calledthe global optimization, is an important problem in variousfields of science and engineering. Many optimization prob-lems are hard to solve since many of them belong to theNP-complete class, where the number of computing stepsrequired to solve the problem increases faster than any powerof the size of the system. Some of the well-known classicexamples are the traveling salesman problem in appliedmathematics and computer science, spin glasses incondensed-matter physics, and the protein-folding problemin biophysics.

Many of global optimization methods have been devel-oped and successfully applied to a variety of problems. Ex-amples of such methods are simulated annealing~SA!,1,2 ge-netic algorithm ~GA!,3,4 Monte Carlo with minimization~MCM!,5 multicanonical annealing,6,7 and quantum thermalannealing.8–10 One of the simplest algorithms for unbiased

global optimization is the SA method, which has been mostwidely used. Although the SA is very versatile in that it canbe easily applied practically to any problem, the drawback isthat its efficiency is usually much lower than problem spe-cific algorithms. This is especially problematic for NP-complete problems. For this reason, it is important to find analgorithm which is as general as SA, and yet competitivewith problem-specific ones. Recently, a powerful global op-timization method called conformational space annealing~CSA!11–13 was proposed, and applied to protein stuctureprediction14–17and Lennard-Jones clusters.18 The benchmarktests11–13 have demonstrated that it cannot only find theknown global-minimum conformations with less computa-tions than existing algorithms, but also provide new globalminima in some cases.

In this paper, we apply the CSA method to an off-latticemodel protein introduced by Honeycutt and Thirumalai.19–21

The so-calledBLN model proteins are constructed from theresidues of three types, hydrophobic (B), hydrophilic (L),and neutral (N) residues found in nature. It is shown thatthey exhibit many similarities with real proteins. With the aidof simulation methods Honeycutt and Thirumalai studied

a!Author to whom all correspondence should be addressed. Electronic mail:[email protected]

JOURNAL OF CHEMICAL PHYSICS VOLUME 119, NUMBER 19 15 NOVEMBER 2003

102740021-9606/2003/119(19)/10274/6/$20.00 © 2003 American Institute of Physics

This article is copyrighted as indicated in the article. Reuse of AIP content is subject to the terms at: http://scitation.aip.org/termsconditions. Downloaded to IP:

128.143.23.241 On: Fri, 19 Dec 2014 17:53:40

Page 3: Conformational space annealing and an off-lattice frustrated model protein

a 46-residue BLN protein with the sequenceB9N3(LB)4N3B9N3(LB)5L that folds into the native struc-ture of a four-strandedb barrel. They suggested the metasta-bility hypothesis that a polypeptide chain may have a varietyof metastable minima corresponding to folded conformationswith similar structural characteristics but different energies.This hypothesis suggests that the particular state into which aprotein folds depends on the initial condition of the environ-ment, and that there are multiple pathways for the foldingprocess. Thirumalaiet al.22,23 also demonstrated that thefolding kinetics of theBLN 46-mer is very similar to that ofreal proteins. It is shown24–27 that the 46-mer has two char-acteristic transition temperatures: the transition from a ran-dom coil state to a collapsed but non-native state and thetransition from a collapsed state to the nativeb-barrel state.Subsequent studies28–32 illustrated that the 46-mer systemexhibits a high degree of frustration and its energy landscapeis very rugged. That is, there are many local minima withenergies close to that of the native state, and it is very diffi-cult to find the global minimum.7,8,10,33

II. BLN MODEL PROTEIN

The potential energy of aBLN protein withM residuesis given by

V5Vb1Va1Vt1Vv , ~1!

whereVb is the bond-stretching energy,Va the bond-anglebending energy,Vt the torsional dihedral-angle energy, andVv the van der Waals energy. The bond-stretching energy is

Vb5 (i 51

M21kr

2~ ur i 112r i u2a!2, ~2!

where the force constant is given bykr5400e/a2, e is theenergy constant,a the average bond length, andr i the posi-tion of the i th residue. The bond-angle bending energy isgiven by

Va5 (i 51

M22ku

2~u i2u0!2, ~3!

where the force constant isku520e/(rad)2, u i is the bondangle defined by three residuesi , i 11 and i 12, and u0

51.8326 rad or 105°. The torsional dihedral-angle energy isexpressed as

Vt5 (i 51

M23

@Ai~11cosf i !1Bi~11cos 3f i !#, ~4!

wheref i is the dihedral angle defined by four residuesi , i11, i 12, andi 13. The amplitudesAi andBi are given byAi50 andBi50.2e if two or more of the four residues areneutral. For all the other cases,Ai5Bi51.2e. Finally, thevan der Waals energy is given by

Vv54e (i 51

M23

(j 5 i 13

M

Ci j F S s

r i jD 12

2Di j S s

r i jD 6G , ~5!

wheres is the Lennard-Jones parameter andr i j is the dis-tance between two nonbonded residuesi and j given by r i j

5ur i2r j u. If both residuesi and j are hydrophobic,Ci j

5Dij51. If one residue is hydrophilic and the other hydro-philic or hydrophobic,Ci j 52/3 andDi j 521. If one residueis neutral and the other neutral, hydrophilic or hydrophobic,Ci j 51 andDi j 50. For convenience, we seta5e5s51.

III. CONFORMATIONAL SPACE ANNEALING

The CSA unifies the essential ingredients of the threeglobal optimization methods, SA, GA, and MCM. First, as inMCM, we consider only the phase space of local minima,that is, all conformations are energy-minimized by a localminimizer. Second, as in GA, we consider many conforma-tions ~called abank in CSA! collectively, and we perturb asubset of bank conformations~seeds! using the informationin other bank conformations. This procedure is similar tomating in GA. However, in contrast to the mating procedurein GA, we replace typicallysmallportions of a seed with thecorresponding parts of bank conformations since we want tosearch the neighborhood of the seed conformation. Finally,as in SA, we introduce an annealing parameterDcut ~a cutoffdistance in the phase space of local minima!, which plays therole of temperature in SA. The diversity of sampling is di-rectly controlled in CSA by introducing a distance measurebetween two conformations and comparing it withDcut,whereas in SA there are no such systematic controls. Thevalue ofDcut is slowly reduced just as in SA, hence the nameconformational space annealing. Maintaining the diversityof the population using a distance measure was also tried inthe context of GA, although no annealing was performed.34

To apply the CSA to an optimization problem, only twothings are necessary; a method for perturbing a seed confor-mation, and a distance measure between two conformations.This suggests that the CSA is a candidate for a versatile andyet powerful global optimization method.

The way we picture the phase space of local minima isas follows ~see Fig. 1!. We assume that most of the phasespace of local minima can be covered by a finite number oflarge spheres with radiusDcut, which are centered on ran-domly chosen minima~bank!. Each of the bank conforma-tions is supposed to represent all local minima contained in

FIG. 1. A schematic for the search procedure of CSA. The boxes representthe identical phase space.~a! Initially, we cover the phase space by largespheres centered on randomly chosen local minima denoted by3 symbols,and replace the centers with lower-energy local minima. When an initialconformationA is replaced by a new conformationa, the sphere moves inthe direction of the arrow.~b! As the CSA algorithm proceeds and theenergies of the representative conformations at the centers of the spheres arelowered, the size of the spheres are reduced and the search space is nar-rowed down to small basins of low-lying local minima.

10275J. Chem. Phys., Vol. 119, No. 19, 15 November 2003 An off-lattice frustrated model protein

This article is copyrighted as indicated in the article. Reuse of AIP content is subject to the terms at: http://scitation.aip.org/termsconditions. Downloaded to IP:

128.143.23.241 On: Fri, 19 Dec 2014 17:53:40

Page 4: Conformational space annealing and an off-lattice frustrated model protein

the sphere centered on it. To improve a bank conformationA, we first selectA as a seed. We perturbA and subsequentlyenergy-minimize it to generate a trial conformationa. Sincea originates fromA by small perturbation, it is likely thatais contained in a sphere centered onA. If the energy ofa islower than that ofA, a replacesA and the center of thesphere moves fromA to a. If it happens thata belongs to adifferent sphere centered onB, a can replaceB in a similarmanner. Whena is outside of all existing spheres, a newsphere centered ona is generated. In this case, to keep thetotal number of spheres fixed, we remove the sphere repre-sented by the highest-energy conformation. Obviously, theformer two cases are more likely to happen when the spheresare large, and the latter when spheres are small. Conse-quently, a larger value ofDcut produces more diverse sam-pling, whereas smaller value results in quicker search of low-energy conformations at the expense of getting trapped in abasin probably far away from the global minimum. There-fore, for efficient sampling of the phase space, it is necessaryto maintain the diversity of sampling in the early stages andthen gradually shift the emphasis toward obtaining low en-ergy conformations, which is realized, in CSA, by slowlyreducing the value ofDcut.

When the energy of a seed conformation does not im-prove after a fixed number of perturbations, we stop perturb-ing it. To validate this judgment, it is important that typicalperturbations are kept small, so that the perturbed conforma-tions are close to their original seeds. When all of the bankconformations are used as seeds~one iteration completed!,this implies that the procedure of updating the bank mighthave reached a deadlock. If this happens we reset all bankconformations to be eligible for seeds again, and we repeatanother iteration. After a preset number of iterations, we con-clude that our procedure has reached a deadlock. When thishappens, we enlarge the search space by adding more ran-dom conformations into the bank and repeat the whole pro-cedure until the stopping criterion is met.

In the application of the CSA method to theBLN modelprotein ~see Fig. 2!, we first randomly generate a certainnumber of initial conformations~for example, fifty randomconformations! whose energies are subsequently minimizedusing Gay’s secant unconstrained minimization solver,SUMSL.35 In the following the termminimizationis used torefer to the application of SUMSL to a given conformation.We call the set of these conformations thefirst bank. Wemake a copy of the first bank and call it thebank. The con-formations in the bank are updated in later stages, whereasthose in the first bank are kept unchanged. Also, the numberof conformations in the bank is kept unchanged when thebank is updated. The initial value ofDcut is set asDave/2,where Dave is the average distance between the conforma-tions in the first bank. New conformations are generated bychoosing a certain number ofseedconformations~for ex-ample, ten seed conformations! from the bank and by replac-ing parts of the seeds by the corresponding parts of confor-mations randomly chosen from either the first bank or thebank. For example, five conformations are generated for eachseed using the partial replacements. Then the energies ofthese conformations are subsequently minimized, and theseminimized conformations become trial conformations.

A newly obtained local minimum conformationa iscompared with those in the bank to decide how the bankshould be updated. One first finds the conformationA in thebank which is closest to the trial conformationa with thedistanceD(a,A) defined by

D~a,A!5 (i 51

M23

min@mod~ uf ia2f i

Au,360°!,

360°2mod~ uf ia2f i

Au,360°!], ~6!

where mod(A,B) is the least positive value ofx satisfyingA5nB1x with an integern. If D(a,A),Dcut, a is consid-ered as similar toA. In this case, the conformation with

FIG. 2. Flow chart of the CSA algo-rithm to locate the global minimum ofa BLN model protein.

10276 J. Chem. Phys., Vol. 119, No. 19, 15 November 2003 Kim, Lee, and Lee

This article is copyrighted as indicated in the article. Reuse of AIP content is subject to the terms at: http://scitation.aip.org/termsconditions. Downloaded to IP:

128.143.23.241 On: Fri, 19 Dec 2014 17:53:40

Page 5: Conformational space annealing and an off-lattice frustrated model protein

lower energy amonga and A is kept in the bank, and theother one is discarded. However, ifD(a,A).Dcut, a is re-garded as distinct from all conformations in the bank. In thiscase, the conformation with the highest energy among thebank conformations plusa is discarded, and the rest are keptin the bank. We perform this operation for all trial conforma-tions.

After the bank is updated, theDcut is reduced by a fixedratio, in such a way thatDcut reachesDave/5 after L localminimizations ~for example,L5500). Then seeds are se-lected again from the bank conformations which have notbeen used as seeds yet, to repeat aforementioned procedure.The value ofDcut is kept constant after it reaches the finalvalue. When all conformations in the bank are used as seeds,one round of iteration is completed. We perform an addi-tional search by erasing the record of bank conformationshaving been used as seeds, and starting a new round of itera-tion. After three iterations are completed, we increase thenumber of bank conformations by adding fifty randomlygenerated and minimized conformations into the bank~andalso into the first bank!, and resetDcut to Dave/2. The algo-rithm stops when the known global minimum is found,which is examined after the bank is updated by all trial con-formations. It should be noted that since one iteration is com-pleted only after all bank conformations have been used asseeds, and we add random conformations whenever oursearch has reached a deadlock, there is no loss of generalityfor using particular values for the number of seeds, the num-ber of bank conformations, etc.

IV. RESULTS AND DISCUSSION

We applied the CSA method to theBLN 46-mer with thesequenceB9N3(LB)4N3B9N3(LB)5L. We carried out 100independent runs and found the global minimum-energy con-formation with energy

E05249.2635 ~7!

for all 100 runs. Figure 3 shows a scatter plot of the numberof function evaluations to obtain the global minimum-energyconformation for 100 independent runs. On average, it tookabout 1.8653106 function evaluations for each run. The totalrunning time for all 100 runs was 48 h and 8 min on anAthlon processor~1.8 GHz!. For a single run it took onlyabout 29 min on average to obtain the global minimum-energy conformation. For the 46-mer, it is reported8,10 thatthe average success rate to reach the global minimum isabout 9% using SA with 323106 Monte Carlo sweeps~1Monte Carlo sweep546 Monte Carlo steps!. For the purposeof fair benchmarking, we also carried out hundreds of the SAruns and we found that our results are consistent with thesenumbers. A single run of SA with 323106 Monte Carlosweeps took about 34 h on the same Athlon processor. There-fore, the CPU time for CSA is about seventy times less thanthat for SA, and its success rate~100%! is about eleven timeshigher. Until now, the best method to obtain the global mini-mum of the 46-mer has been the quantum thermal annealingwith renormalization~QTAR!.10 The QTAR method has beenthe only method with 100% success rate and yet requires 7.5

times less computation than SA. Therefore, compared toQTAR, CSA is about ten times more efficient in finding theglobal minimum of theBLN 46-mer.

One of the attractive aspects of CSA is that a populationof distinct local minima is obtained as a by-product, in addi-tion to the global minimum-energy conformation. Gatheringall conformations in the final banks of 100 independent CSAruns, we have obtained a total of 6050 conformations alto-gether. The conformation with the highest energy among the6050 conformations has the energy

Eh5236.355. ~8!

Table I shows the energy spectrum of these 6050 conforma-tions. There are 2361 different energy levels betweenE0 andEh . As the energy level becomes higher, the number of en-ergy levels increases fast. This implies that conformations

FIG. 3. Scatter plot of the number of function evaluations to obtain theglobal minimum-energy conformation with energyE05249.2635 for 100independent runs. The global minimum-energy conformation was success-fully obtained for all 100 independent runs with about 1.8653106 functionevaluations on average.

TABLE I. The energy spectrum of 6050 final conformations for all 100independent runs of CSA. NE is the number of energy levels for a givenenergy range, NC is the number of conformations for a given energy range,and DC is the density of conformations~the number of conformations perenergy level, that is, DC5NC/NE). The total number of energy levels is2361.

Energy range NE NC DC

E0<E<249 5 295 59.00249,E<248 31 722 23.29248,E<247 111 1033 9.31247,E<246 192 699 3.64246,E<245 297 689 2.32245,E<244 374 766 2.05244,E<243 377 624 1.66243,E<242 353 517 1.46242,E<241 263 319 1.21241,E<240 159 181 1.14240,E<239 110 114 1.04239,E<238 50 51 1.02238,E<237 27 28 1.04237,E<Eh 12 12 1.00

10277J. Chem. Phys., Vol. 119, No. 19, 15 November 2003 An off-lattice frustrated model protein

This article is copyrighted as indicated in the article. Reuse of AIP content is subject to the terms at: http://scitation.aip.org/termsconditions. Downloaded to IP:

128.143.23.241 On: Fri, 19 Dec 2014 17:53:40

Page 6: Conformational space annealing and an off-lattice frustrated model protein

near the global minimum are discretely distributed but con-formations at higher energies are quasicontinuum states. InTable I the number of energy levels decreases forE.243due to the fact that conformations of high energies are dis-carded as a CSA run proceeds. Figure 4 shows the number ofconformations as a function of energy forE<248. It shouldbe noted that there exist two largest energy gaps

DEa50.201 ~9!

betweenE55248.939 andE65248.738 and

DEb50.138 ~10!

betweenE85248.731 andE95248.593.Since the potential energy ofBLN model proteins is

invariant under the inversion transformation

r i→2r i

or

f i→2f i , ~11!

two conformational states exist for each energy level~two-fold degeneracy!, including the global minimum-energy con-formation. The root-mean-square deviation~RMSD! betweenthe two global minima is 0.862. The inversion symmetry isemployed in CSA by generating conformations with positivevalues only for the first dihedral anglef1 (0<f1<180).Figure 5 shows the RMSD values as a function of energy forall 2361 conformations with 0<f1<180, calculated fromthe global minimum. In Fig. 5 most of the 2361 conforma-tions have small values of RMSD from the global minimum,independent of their diverse energy values. These conforma-tions are grouped together in the region RMSD,1.6, sepa-rated from conformations with larger values of RMSD. Thenumber of conformations with RMSD values less than 1.6 is2215, which is 93.8% of all conformations. It is quite inter-esting to observe that no conformations with energy less than244.34 appear in the region RMSD.1.6. We also per-formed 100 CSA runs without employing the inversion sym-

metry. The results are quite similar to those in Table I andFigs. 3–5, and finding the global minimum requires about1.4 times more computation.

Now we examine the RMSD-versus-energy distributionfor conformations near the global-minimum energyE0 indetail. Figure 6 is an enlargement of the lower left corner ofFig. 5 for E<248. Similarly, Fig. 7 shows the results with2180<f1<180. The values of RMSD are calculated fromone of the two global minima. The RMSD distribution cal-culated from the other global minimum is similar to Fig. 7.The twelve~six pairs! conformations below the largest gapDEa @Eq. ~9!# in Fig. 7 are of two branches, which are re-lated to each other by the inversion transformation. We callten of these conformations (E15249.186, E25249.149,E35249.063, E45249.002, andE55248.939) quasi-global-minima, since they are more frequently obtained asfinal candidates for global minimum, in SA, than the nativestate is. Once a SA run is trapped into one of these quasi-global-minima, the trapped conformation cannot be folded

FIG. 4. The number of conformations as a function of energy forE<248, accumulated from all 100 independent runs. The closed square repre-sents the number of the global minimum-energy conformation with energyE05249.2635.

FIG. 5. Distribution of the RSMD values as a function of energy for all2361 conformations calculated from one of the two global minima. Theclosed square represents the global minimum-energy conformation with en-ergy E05249.2635.

FIG. 6. Distribution of the RMSD values as a function of energy for con-formations with energyE<248 ~an enlargement of the lower left-handcorner of Fig. 5!. The closed square represents the global minimum-energyconformation with energyE05249.2635.

10278 J. Chem. Phys., Vol. 119, No. 19, 15 November 2003 Kim, Lee, and Lee

This article is copyrighted as indicated in the article. Reuse of AIP content is subject to the terms at: http://scitation.aip.org/termsconditions. Downloaded to IP:

128.143.23.241 On: Fri, 19 Dec 2014 17:53:40

Page 7: Conformational space annealing and an off-lattice frustrated model protein

into the global minimum before crossing over the two largeenergy barriers of Eqs.~9! and ~10!. This clearly demon-strates the reason for the poor success rate of SA. On theother hand, the RMSD values for the conformations abovethe energy gap are scattered between 0.15 and 1.2. In par-ticular, conformations with energyE85248.731 (RMSD50.156, 0.881! can be easily folded into the native statewithout being trapped in one of the quasi-global-minima.

V. CONCLUSION

We have applied the versatile and powerful globaloptimization method, conformational space annealing, tothe 46-residue BLN protein with a sequenceB9N3(LB)4N3B9N3(LB)5L. The CSA method always main-tains the diversity of sampling and is able to cross the highenergy barriers between local minima. Consequently, thismethod not only finds the global minimum-energy conforma-tion successfully and efficiently but also investigates manyother distinct local minima as a by-product.

This 46-residueBLN protein is folded into the nativestate of a four-strandedb barrel. It has been a challengingproblem to locate the global minimum of theBLN 46-merbecause the molecule is highly frustrated and its energy land-scape is quite rugged. We conducted 100 independent CSAruns and successfully found the global minimum-energy con-formation of the 46-residueBLN protein for all 100 runs.The CPU time for CSA is about seventy times less than thatfor simulated annealing, and its success rate~100%! to findthe global minimum is about eleven times higher. In addi-tion, the amount of computational efforts required for CSA tofind the global minimum is about ten times less than that forthe QTAR, the best global optimization method yet appliedfor the 46-residueBLN protein.

The 100 separate CSA runs produce the global minimum100 times as well as the other 5950 final conformations cor-responding to 2360 distinct local minima of the 46-residue

BLN protein. As the values of energy levels becomes higher,the number of energy levels~or the number of local minima!increases fast. Conformations near the global minimum arediscretely distributed but conformations of higher energiesare quasicontinuum states. Most of final conformations haverelatively small RMSD values from the global minimum,independent of their diverse energy values. Very close to theglobal minimum, there exist quasi-global-minima which arefrequently obtained as one of the final answers from SA runs.We find that there exist two largest energy gaps between thequasi-global-minima and the other local minima. Once a SArun is trapped in one of these quasi-global-minima, it cannotbe folded into the global minimum before crossing over thetwo large energy barriers, clearly demonstrating the reasonfor the poor success rate of SA.

ACKNOWLEDGMENT

We are grateful to Professor Julian Lee for useful discus-sions and for kindly drawing Fig. 1.

1S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi, Science220, 671 ~1983!.2P. van Laarhoven and E. H. L. Aarts,Simulated Annealing: Theory andApplications~Kluwer, Dordrecht, 1992!.

3J. Holland, SIAM J. Comput.2, 88 ~1973!.4D. E. Goldberg,Genetic Algorithms in Search, Optimization, and MachineLearning ~Addison–Wesley, Reading, MA, 1989!.

5Z. Li and H. A. Scheraga, Proc. Natl. Acad. Sci. U.S.A.84, 6611~1987!.6J. Lee and M. Choi, Phys. Rev. E50, 651 ~1994!.7H. Xu and B. J. Berne, J. Chem. Phys.112, 2701~2000!.8Y.-H. Lee and B. J. Berne, J. Phys. Chem. A104, 86 ~2000!.9Y.-H. Lee and B. J. Berne, Ann. Phys.~Leipzig! 9, 668 ~2000!.

10Y.-H. Lee and B. J. Berne, J. Phys. Chem. A105, 459 ~2001!.11J. Lee, H. A. Scheraga, and S. Rackovsky, J. Comput. Chem.18, 1222

~1997!.12J. Lee, H. A. Scheraga, and S. Rackovsky, Biopolymers46, 103 ~1998!.13J. Lee and H. A. Scheraga, Int. J. Quantum Chem.75, 255 ~1999!.14J. Lee, A. Liwo, and H. A. Scheraga, Proc. Natl. Acad. Sci. U.S.A.96,

2025 ~1999!.15A. Liwo, J. Lee, D. R. Ripoll, J. Pillardy, and H. A. Scheraga, Proc. Natl.

Acad. Sci. U.S.A.96, 5482~1999!.16J. Lee, A. Liwo, D. R. Ripoll, J. Pillardy, and H. A. Scheraga, ProteinsS3,

204 ~1999!.17J. Lee, A. Liwo, D. R. Ripoll, J. Pillardy, J. A. Saunders, K. D. Gibson,

and H. A. Scheraga, Int. J. Quantum Chem.77, 90 ~2000!.18J. Lee, I.-H. Lee, and J. Lee, Phys. Rev. Lett.~to be published!.19J. D. Honeycutt and D. Thirumalai, Proc. Natl. Acad. Sci. U.S.A.87, 3526

~1990!.20J. D. Honeycutt and D. Thirumalai, Biopolymers32, 695 ~1992!.21T. Veitshans, D. Klimov, and D. Thirumalai, Folding Des.2, 1 ~1997!.22Z. Guo, D. Thirumalai, and J. D. Honeycutt, J. Chem. Phys.97, 525

~1992!.23D. Thirumalai and Z. Guo, Biopolymers35, 137 ~1995!.24Z. Guo and D. Thirumalai, Biopolymers36, 83 ~1995!.25Z. Guo and C. L. Brooks, Biopolymers42, 745 ~1997!.26Z. Guo and D. Thirumalai, Folding Des.2, 377 ~1997!.27J.-E. Shea, Y. D. Nochomovitz, Z. Guo, and C. L. Brooks, J. Chem. Phys.

109, 2895~1998!.28R. S. Berry, N. Elmaci, J. P. Rose, and B. Vekhter, Proc. Natl. Acad. Sci.

U.S.A. 94, 9520~1997!.29H. Nymeyer, A. E. Garcia, and J. N. Onuchic, Proc. Natl. Acad. Sci.

U.S.A. 95, 5921~1998!.30N. Elmaci and R. S. Berry, J. Chem. Phys.110, 10606~1999!.31M. A. Miller and D. J. Wales, J. Chem. Phys.111, 6610~1999!.32D. A. Evans and D. J. Wales, J. Chem. Phys.118, 3891~2003!.33P. Amara and J. E. Straub, J. Phys. Chem.99, 14840~1995!.34B. Hartke, J. Comput. Chem.20, 1752~1999!.35D. M. Gay, ACM Trans. Math. Softw.9, 503 ~1983!.

FIG. 7. Distribution of the RMSD values as a function of energy for con-formations with energyE<248, obtained from 100 CSA runs without em-ploying the inversion symmetry. Two closed squares represent the globalminimum-energy conformations with energyE05249.2635 connected bythe inversion transformation.

10279J. Chem. Phys., Vol. 119, No. 19, 15 November 2003 An off-lattice frustrated model protein

This article is copyrighted as indicated in the article. Reuse of AIP content is subject to the terms at: http://scitation.aip.org/termsconditions. Downloaded to IP:

128.143.23.241 On: Fri, 19 Dec 2014 17:53:40