Comprehensive analysis of the helix-X-helix motif in soluble proteins

proteinsSTRUCTURE O FUNCTION O BIOINFORMATICS

Comprehensive analysis of the helix-X-helixmotif in soluble proteinsJulie Deville, Julien Rey, and Marie Chabbert*

CNRS UMR 6214-INSERM U771, Universite dAngers, Faculte de Medecine, 3 rue Haute de Reculee, 49045 Angers, France

INTRODUCTION

Understanding the relationship between amino acid sequence

and protein structure is of primary importance to develop meth-

ods aimed at improving structure predictions and designing de

novo proteins. Secondary structure (SS) is a crucial level in the

hierarchical classification of protein structure. a-Helices and b-sheets, the major SS elements, allow a simple description of pro-

tein structures and are used for protein classification (e.g. the

SCOP or the CATH databases1,2). These regular backbone struc-

tures are linked by loops or turns.

Prediction of the secondary structure of a protein from its

sequence is a key step for structure prediction. The different pro-

pensities of each amino acid for helical, strand, or random coil

conformations have been acknowledged long ago3,4 and form

the bases of numerous SS prediction programs.58 All the SS

prediction programs rely on pattern recognition techniques com-

bined to statistical analysis of known protein structures. The ac-

curacy rate for a three state prediction (helix, strand, and coil) is

about 60% for the initial algorithms based on a single sequence

analysis3,6,9 and rises to about 75% for current algorithms

based on multiple sequence alignment.8,1014 One limitation of

these programs is that they rely on the analysis of the sequence

properties of a window surrounding each individual amino acid,

implying a poor precision. For example, Jnet predictions are

based on a 17 residue long window.11 Although a-helices aregenerally better predicted than b-strands or coils, it is difficultto identify their extremities correctly, in spite of strong capping

sequence signals.15 Another caveat due to the window size is

that short irregular elements joining similar SS elements can be

missed. Two helices joined by a few residue linker may be pre-

dicted to form a single long helix.

The capability to develop bioinformatics tools able to predict

the position and the structure of such kinks between two a-heli-ces should be very valuable for both structure predictions and

protein design. Development of these tools requires a compre-

hensive analysis of these structures. The properties of two or

three residue long linkers joining a-helices have been widely

*Correspondence to: Dr. Marie Chabbert, CNRS UMR 6214-INSERM U771, Faculte de

Medecine dAngers, 3 rue Haute de Reculee, 49045 Angers, France.

E-mail: [email protected]

Received 25 April 2007; Revised 12 September 2007; Accepted 11 October 2007

Published online 23 January 2008 in Wiley InterScience (www.interscience.wiley.com).

DOI: 10.1002/prot.21879

ABSTRACT

a-Helices are the most common secondary structures

found in globular proteins. In this report, we analyze

the stereochemical and sequence properties of helix-X-

helix (HXH) motifs in which two a-helices are linked

by a single residue, in search of characteristic struc-

tures and sequence signals. The analysis is carried out

on a database of 837 nonredundant HXH motifs. The

kinks are characterized by the bend angle between the

axes of the N-terminal and C-terminal helices and the

wobble angle corresponding to the rotation of C-ter-

minal helix axis on the plane perpendicular to the N-

terminal one. The phi-psi dihedral angles of the linker

residue are clustered in six distinct areas of the Rama-

chandran plot: two areas are located in the additional

allowed a region (a1 and a2), two areas are in the

additional allowed b region (b1 and b2) and two

areas have positive phi values (aL and bM). Each phi/

psi region corresponds to characteristic bend and wob-

ble angles and amino acid distributions. Bend angles

can vary from 08 to 1608. Most wobble angles corre-spond to a counter-clockwise rotation of the C-termi-

nal helix. Proline residues are rigorously excluded

from the linker position X but have a high propensity

at position X11 of the b1 and b2 motifs (12 and 7,respectively) and at position X13 of the a1 motifs (9).Glycine linkers are located either in the aL region

(20%) or in the bM region (80%). This latter confor-

mation is characterized by a marked bend angle (12486 188) and a clockwise wobble. Among other aminoacids, Asn is remarkable for its high propensity (>3)at the linker position of the a2, b1, and b2 motifs. Sta-

bilization of HXH motifs by H-bonds between polar

side chains of the linker and polar groups of the back-

bone is determined. A method based on position-spe-

cific scoring matrices is developed for conformational

prediction. The accuracy of the predictions reaches

80% when the method is applied to proline-induced

kinks or to kinks with bend angles in the 5081008range.

Proteins 2008; 72:115135.VVC 2008 Wiley-Liss, Inc.

Key words: 3D data mining; protein structure; helix

kink; proline; glycine.

VVC 2008 WILEY-LISS, INC. PROTEINS 115

investigated,1618 but one residue linkers did not receive

much attention yet. However, an analysis of the Protein

Data Bank17 indicated that a structural motif consisting

of two consecutive helices separated by one residue (the

helix-X-helix or HXH motif) is almost as frequent as

motifs with two helices joined by a two or three residues.

In this study, we thus focus on one residue linkers

joining a-helices in soluble proteins. We analyze the ster-eochemical and sequence properties of HXH motifs in a

nonredundant database that we have developed. We show

that the relative orientation between the two helices is

determined by a few possible dihedral conformations of

the protein backbone at the linker position and that each

dihedral conformation exhibits a distinct amino acid dis-

tribution at and around the linker. We analyze H-bonds

within the HXH motif to determine the interactions sta-

bilizing each conformation. Finally, we develop a method

based on position-specific scoring matrices for conforma-

tional prediction.

MATERIALS ANDMETHODS

Terminology

The helices are named as helix 1 (H1) or helix 2 (H2)

depending upon their relative position in the protein pri-

mary structure. Following the common nomenclature, the

first i positions of an a-helix are called N1, N2, N3, . . . ,Ni, the first preceding position Ncap and the second pre-

ceding position N0, whereas the last j positions are Cj, . . . ,C3, C2, and C1, followed by Ccap and C0. The linker posi-tion is called X and corresponds both to the Ccap position

of helix 1 (Ccap(H1)) and to the Ncap position of helix 2

(Ncap(H2)). The ith position upstream is called X2i andthe jth position downstream is X1j. Position X21 corre-sponds to C1(H1) and to N0(H2), whereas position X11corresponds to N1(H2) and to C0(H1).

Data sets

PDB_25 and PDB_90 refer to nonhomologous sets of

protein structures selected from the 25% and the 90%

threshold lists, compiled by Hobohm and Sander19 and

accessible at http://bioinfo.tg.fh-giessen.de/pdbselect (March

2006 release). PDB_25S and PDB_90S refer to subsets of

PDB_25 and PDB_90, respectively, containing only soluble

proteins whose structure was determined by crystallography

with a resolution 2.5 A and an R-factor 0.25. Helix def-inition was determined by DSSP.20

We developed a database of HXH motifs from PDB_90,

according to the procedure summarized in Figure 1. First,

soluble proteins whose structure was determined by X-ray

crystallography with a resolution of 2.5 A and a R-factor

C5 to C2 of H1 and from N2 to N5 of H2) did not alter

the results.

The motif was characterized by the bend angle ub andthe wobble angle uw. Definition of these angles allows adescription of the relative orientation of the two helices

in spherical coordinates. The bend angle ub defines theangle between the axes of the two helices [Fig. 2(a)]. It is

measured as:

cosub H1 H2

A bend angle of 08 indicates that there is no change inthe direction of the helix axes, whereas a bend angle of

1808 indicates a total reversal in the direction of helix 2(antiparallel helices).

The wobble angle uw [Fig. 2(b)] corresponds to therotation of the projection of H2 in a plane perpendicular

to H1. The reference vector in this plane joins the helix

center to the Ca atom of the C4 residue of H1. This vec-tor corresponds to the bisector of the H1 C5-C4-C3 angle.

It was chosen as it allows an intuitive description of the

kink geometry. Left-handed kinks correspond to a positive

uw whereas right-handed kinks correspond to a negativeuw. For small bend angles, determination of the wobbleangle could be difficult. In these cases, the wobble angle

was verified by shifting the reference atoms by one or two

residues and visual inspection. In particular, each kink

with a negative wobble angle was carefully controlled by

visual inspection and detailed structural analysis.

Search of reverse false positives

The search for residues that are identified as helical by

DSSP but have phi/psi values similar to those of the

HXH linker (reverse false positives) was carried out on

PDB_90S. We considered helices that were at least eleven

residue long. We measured the backbone dihedral angles

of each residue included from position N6 to position

C6 along with the dihedral angles of the neighbor resi-

dues. Residues surrounded by at least one residue (from

position 25 to 21 and from position 11 to 15) out ofthe a-helical range were removed from the data set, aspreviously done with the HXH motifs, to consider only

helix distortions due to a single residue.

The search of a2-like structures was carried out onPDB_25S. We first searched for residues located within

a-helices whose dihedral angles summed to 2908 108.Among the 2200 residues found, we selected those whose

psi was larger than 2208. This led to a set of 59 structuresused for visual inspection and further characterization of the

properties of the residues surrounding the a2-like residue.

Proline-containing helices

The analysis was carried out on PDB_25S. We consid-

ered helices with a minimal length of 11 residues that

contained a proline located from position N5 to position

C5. This led to a set of 85 structures, used for determina-

tion of the sequence and of the dihedral angles of the

residues surrounding the proline.

Amino acid propensity

The propensity Pak of amino acid a to occur at position

k of a motif K was determined by the ratio of the number

of amino acids a observed at this position k (nak) to the

number of amino acids a expected (naexp) from the amino

acid distribution in the entire data set:

Pak nak=naexp

Pak nak=NK =na=N

where nak is the number of amino acids a at position k,

NK is the number of motifs K, na is the total number of

amino acids a in the full data set and N the total number

of amino acids in the full data set. The amino acid distri-

bution of PDB_25 was used as reference after verifying

the absence of any significant change in the amino acid

distributions of PDB_25, of PDB_90 and of the database

of the proteins containing the HXH motifs.

The number NK of the motifs analyzed varied from 37

to 218 and the expected number of amino acids could be

very low. To avoid biases due to small sample sizes, for

each amino acid a at position k, we calculated a Z-score

defined as:

Zak nak naexprawhere ra, the standard deviation of the observed numberof an amino acid a, can be approximated by the square

Figure 2Definitions of the bend angle yb and of the wobble angle yw. (a) and (b)correspond to two perpendicular views of a schematic HXH motif, parallel and

perpendicular to the axis of H1, respectively. The Ca atoms of the linker X andof residue C4 of H1 are shown as light grey and dark grey spheres, respectively.

The Helix-X-Helix Motif

PROTEINS 117

root of the expected value. This formula, initially pro-

posed by Engel and DeGrado,17 was experimentally vali-

dated by random trials in PDB_25. |Z| of 2.0 and 2.6 cor-

respond to 95% and 99% confidence levels, respectively.

Hydrophobicity profiles

The hydrophobicity index H at each position sur-

rounding the reference residue X corresponds to the aver-

age of the hydrophobicity index at this position for all

the motifs in the subset considered. The Eisenbergs con-

sensus scale22 was used after verifying that different

scales23,24 led to similar results. Heterogeneity between

subsets was tested using v2 homogeneity tests. For thisanalysis, amino acids were grouped in three classes:

hydrophobic (L, M, I, V, F, Y, W), polar (H, K, R, E, Q,

D, N, S, T), and other (P, G, A, C).

Solvent accessibility

Solvent accessibility was calculated with the NACCESS

program25 (http://wolf.bms.umist.ac.uk/naccess). The pro-

gram calculates the atomic accessible surface when a 1.4 A

probe is rolled around the van der Waals surface of the

protein, according to the Lee and Richard method.26 Resi-

due accessibility corresponds to the sum of the atomic

accessibilities.

H-bond analysis

Analysis of hydrogen bonds was carried out with the

HBPLUS v3.0 program accessible at http://www.biochem.

ucl.ac.uk/bsm/hbplus/home.html.27 Default parameters

were used throughout the analysis. We analyzed main

chain main chain H-bonds involving the carbonyl oxy-

gens of residues X24 to X and the amide nitrogens ofresidues X to X14, in search of disruption of helical pat-tern and alternative H-bonds. When several H-bonds

were possible, the H-bond conserving the helical pattern

was privileged. The only exception was for H-bonds typi-

cal of the Schellman motif for which bifurcate H-bonds

were taken into account.

We analyzed side chain main chain hydrogen bonds

for polar side chain Asn, Asp, Gln, Glu, Ser, Thr, and His

located at the linker position X. We considered hydrogen

bonds involving the g-hydroxyl oxygen of Ser/Thr resi-dues as donor or acceptor, the d- and e-nitrogens of Asn,Gln, and His as donor, the d- and e-oxygens of Asp/Asnand Glu/Gln as acceptor. We searched for putative H-

bond partners among polar atoms of the backbone at

positions from X24 to X14. Hydrogen bonds linking adonor side chain atom at position X to an acceptor back-

bone oxygen at position X2i are denoted by O(X2i).Similarly, hydrogen bonds linking a donor backbone

nitrogen at position X1i to an acceptor side chain atomat position X are denoted by N(X1i).

Prediction of the linker conformation

Position-specific scoring matrices, based on the relative

amino acid weights, were developed to predict linker

conformations. The score S(C) of a sequence a1. . .ai. . .anfor the conformation C was given by:

SC X

iWCai

where ai is the amino acid at position i in the sequence

considered and wC (ai) its relative weight at the same

position i in the conformation C.

A sequence was assigned to the conformation giving

the highest score. The accuracy of the prediction was

defined by the Q-score matrix which gives both the ratio

Qxobs of the number of correctly predicted conformations

x to the number of observed conformations x:

Qxobs MxxPx Mxy

and the ratio Qxpred of the number of correctly predicted

conformations x to the number of predicted conforma-

tions x:

Qxpred MxxPy Myx

where Mxy is the number of sequences observed in con-

formation x and predicted in conformation y.

To built and test the scoring matrices on different data

sets and to estimate standard deviations of the Q-scores,

we used a 10-fold, cross validation procedure, that is, 9/

10th of the data was used to build the matrices and the

remaining 1/10th of the data was used to test them.

Molecular graphics

PYMOL (DeLano Scientific LLC, San Francisco) was

used for figure preparation and molecular graphics analy-

sis. Representative structures were chosen after visual

inspection. They correspond to: (a1) O-acetylserine sulf-hydrylase from Thermotoga maritima (PDB code: 1O58,

chain D)28; (a2) Gamma-glutamyl phosphate reductasefrom Saccharomyces Cerevisiae (PDB code: 1VLU, chain

A) (unpublished); (b1) 28KDA Glutathione S-transferasefrom Schistosoma haematobium (PDB code: 1OE8, chain

A)29; (b2) Human glutathione synthetase (PDB code:2HGS, chain A)30; (aL) S-Adenosylmethionine-depend-ent methyltransferase from Thermotoga maritima (PDB

code : 1M6Y, chain A)31; (bM) Periplasmic iron-bindingprotein from Neisseria gonorrhoeae (PDB code, 1XC1,

chain A).32 The structure with the typical translation

observed for a2 motifs with negative psi corresponds tochorismate mutase from Saccharomyces cerevisiae (PDB

code : 5CSM, chain A).33

J. Deville et al.

118 PROTEINS

RESULTS

The helix-X-helix database

Using the DSSP definition of SS elements, we analyzed

the length distribution of short segments (up to three

residues) joining two a-helices in a subset of PDB_25containing 1200 protein chains of soluble proteins solved

by X-ray crystallography at a resolution of 2.5 A or bet-

ter. The subset contained 8044 helices with a length of at

least five residues. This length was chosen as it corre-

sponds to the minimal length required to reliably deter-

mine a-helices with DSSP.15

A linker length of 0 residue corresponds to two contig-

uous helices without any residue between them. It is sel-

dom observed (3 examples). On the other hand, the

numbers of observations for one, two, or three residue

linkers are very similar (160, 144, and 192 observations,

respectively). This result is in agreement with a previous

study17 using a different helix definition. The distribu-

tion of the bend angles ub between the two helices as afunction of the linker length is shown in Figure 3. Kinks

with one residue linkers overwhelm the kink distribution

when the bend angle is 1000) was sufficient for quantitative analysis but datacould be severely biased. To overcome these limitations,

we developed a HXH database from PDB_90, according

to the procedure described in Materials and Methods

(see Fig. 1). This ensured a sequence identity 45% (5residues out of 11) for the minimal eleven residue long

motif. This identity rate was the best balance between the

size of the sample and the redundancy. A filter based on

the backbone dihedral angles of the five residues sur-

rounding the linker residue X was added to remove false

positives (structures with at least one residue in a non-

helical conformation from positions X25 to X21 andX11 to X15 after the DSSP-based selection process).Nonhelical dihedral angles were found for 2% of posi-

tions X11, 1% of positions X21, and about 1% of allthe other positions. This led to the removal of 4% of the

motifs, without altering the overall results. The resulting

database is composed of 837 HXH motifs with associated

protein structures. 47% of the sequence pairs of the min-

imal HXH motifs have no sequence identity, whereas

94% of them have a sequence identity 27% (3 residuesout of 11), enlightening the sequence diversity of the

database.

Backbone geometry

Using the HXH database, we analyzed the geometry of

the backbone at the linker position X. The phi/psi dihe-

dral angles of this residue cluster in six distinct areas of

the Ramachandran plot, indicating six possible confor-

mations of the protein backbone [Fig. 4(a)]. None of

these conformations is located in the most favored

regions of the Ramachandran plot. Two of these confor-

mations, a1 and a2, correspond to distortions of the aconformation and are in the additional allowed region

surrounding the most favored a-helix region. Together,they represent about 50% of the motifs (212 and 188

occurrences out of 837, for the a1 and a2 conformations,respectively). The a1 conformation is distorted as com-pared to a canonical a helix (phi 5 2588, psi 5 2478)with a large shift of the phi angle to 21188 158,whereas the psi angle corresponds to canonical values

(2588 148). This conformation corresponds to thebackbone conformation observed in p helices.34 The a2conformation is located at the upper limit of the allowed

a region. It is characterized by a strong correlationbetween the phi and psi angles whose sum is almost con-

stant (2908 108). The average value of psi is slightlypositive (88 198) while phi is shifted to 2978 208.This conformation encompasses the 310 helix area (phi 52748, psi 5 248) and corresponds to the g conforma-tion reported by Brazhnikov and Efimov.16 It makes a

Figure 3Distribution of the bend angles between two helices separated by a linker of one

residue (black bars), two residues (white bars), and three residues (grey bars). N

represent the number of observations in a subset of PDB_25 containing 1200

protein chains.


PROTEINS 119

transition from the a to the b regions of the Ramachan-dran plot.

Two areas of the b region of the Ramachandran plotare also possible for the linker [Fig. 4(a)]. They are not

in the core, but in the additional allowed b region. Theb1 conformation represents 26% of the motifs (218occurrences). It is characterized by a phi angle of 21378 158 and a psi angle of 838 248. This region was firstdescribed by Karplus35 and is largely associated with res-

idues preceding proline.36 The b2 conformation is lessfrequent and represents only 12% of the kinks (104

occurrences). Its dihedral angles, phi 5 2738 118 andpsi 5 1228 268, are close to the values observed forresidues in polyProline type II helix conformation

(PPII).36,37 PPII helix is a left-handed helical structure

formed when sequential residues adopt backbone dihe-

dral angles centered around 2758 and 1458.38,39 Thisarea is also frequently associated with pre-Pro resi-

dues.35,36

Thirteen percent of the kinks have a positive phi value.

They correspond to two distinct conformations [Fig.

4(a)]. The bM conformation (phi 5 968 158, psi 51578 238), observed in 9% of the kinks (78 occur-rences), corresponds to a mirror region of the b confor-mation.36,37 This conformation has been recently

described by Lovell et al.37 It is associated with glycine

residues, because the lack of a side chain for Gly pro-

duces mirror symmetry in steric constraints and in the

phi/psi distribution.37 Finally, the aL conformation, cor-responding to left a-helices with phi 5 738 248 andpsi 5 258 258, is observed for 4% of the kinks (37cases).

It is worth to note that the relative weights of the six

conformations are very similar in PDB_25S (a1: 0.22, a2:0.21, b1: 0.29, b2: 0.14, aL: 0.04, bM: 0.09) and in theHXH database (a1: 0.25, a2: 0.22, b1: 0.26, b2: 0.13, aL:0.04, bM: 0.09), strongly suggesting that the procedurefor building the database does not lead to significant bias

of the data.

Distribution of bend and wobble angles

The distribution of bend and wobble angles describing

the relative orientation of the two a-helices was analyzedfor each conformation of the linker residue [Fig. 4(b)].

Examples of each conformation are shown in Figure 5(a

f). The a1 conformation [Fig. 5(a)] allows only moderatedeviations from linearity with an average bend angle of

30 158. The a2 conformation [Fig. 5(b)] allows amuch larger range of bend angles from 108 to 1008 withan average value of 528 238. Most a1 and a2 motifshave a positive wobble angle, corresponding to a left-

handed helix motion. Negative wobble angles, corre-

sponding to a right-handed helix motion, are observed

only for a minor part of these motifs (3 and 6% for a1and a2 motifs, respectively), usually with small bendangles (ub < 308). The average wobble angles of theseconformations are 778 488 and 798 508 for the a1and a2 conformations, respectively.In addition to the a2 conformation, bend angles in the

4081008 range can be reached through the b2 and theaL conformations [Fig. 4(b)]. However, the average wob-ble angles of these conformations are very different. The

Figure 4(a) Ramachandran plot of the linker residue X and (b) distribution of the bend and wobble angles between the two helices, for the 837 helix-X-helix motifs of the

database. The color code indicates the HXH conformation (a1: dark blue; a2: sky blue; b1: violet; b2: pink; aL: green; bM: red).

J. Deville et al.

120 PROTEINS

wobble angle is near zero for the aL conformation withan average value of 228 318 [Fig. 5(e)]. For the b2conformation [Fig. 5(d)], large amplitude counter-clock-

wise rotation of H2 leads to uw values that can be largerthan 1808 (average value of 1628 358) and thus to posi-tive and negative wobble values.

Bend angles ranging from 908 to 1608 can be reachedthrough either the b1 or the bM conformations (average

values of 1138 168 and 1248 188, respectively [Fig.4(b)]). However, these two conformations correspond to

very different wobble angles. The b1 conformation leadsto a left-handed motion of helix 2 with an average value

of 888 368 [Fig. 5(c)], whereas the bM conformationleads to a right-handed motion of helix 2 with an aver-

age value of 2558 358. This conformation leads to areversal of helix direction. In the example of bM confor-

Figure 5Typical HXH motifs for each linker conformation. (a) a1 motif: residues 136153 of 1O58 chain D, X 5 Thr-145, yb 5 338, yw 5 1288; (b) a2 motif: residues 2357 of1VLU chain A, X 5 Asn-40, yb 5 678, yw 5 998; (c) b1 motif: residues 183206 of 1OE8 chain A, X 5 Ser-196, yb 5 1078, yw 5 1098; (d) b2 motif: residues 6991 of2HGS chain A, X 5 Asn-82, yb 5 908, yw 5 1378; (e) aL motif: residues 205228 of 1M6CY chain A, X 5 Arg-217, yb 5 538, yw 5 18; (f) bM motif: residues 140168 of 1XC1 chain A, X 5 Gly-154, yb 5 1348, yw 5 2828. The helix-X-motif is shown as a green ribbon. The axis of helix 1 is vertical. The side chains of the linkerresidues and of the prolines, when present, are shown as sticks. In (f), the Ca of Gly at position X is shown as a sphere and the side chain of Ala at position X14 isshown as a stick. Dashed lines represent H-bonds either between polar linker side chains and the protein backbone (ad) or typical of the Schellman motif (e). Polar

atoms involved in H-bonds are shown as spheres.


PROTEINS 121

mation displayed in Figure 5(f), the bend angle between

the two helices reaches 1348.A striking result of this analysis is the anisotropy of

the wobble motion. Most kink conformations correspond

to a counter-clockwise rotation of helix 2. This is the

case for the a1 and a2 conformations, except for a fewmotifs, and for the b1 and b2 conformations. The nega-tive values measured for the b2 conformations corre-spond to counter-clockwise rotation larger than 1808.The aL conformation displays small amplitude wobblerotation. Only the bM conformation allows a markedclockwise rotation of helix 2 with a reversed wobble

motion as compared to the other conformations.

Main chain main chain hydrogen bonds

HBPLUS was used for a detailed analysis of the H-

bonds involving the backbone NH and CO groups of the

HXH motif from position X24 to position X14. We

checked the conservation of the conventional NH(i) to

CO(i24) H-bonds within the motif and, when these H-bonds were not conserved, alternative interactions. We

thus analyzed the H-bonds involving the carbonyl groups

of helix 1, the amine groups of helix 2, and either group

for residue X for the six conformations (Table I). In

most cases, the helical H-bond pattern involving the

linker X is conserved and this residue is involved both in

NH to CO H-bond with residue X24 and in CO to NHH-bond with residue X14. The only exception is the b2conformation. For this conformation, only 50% of the

NH(X) to CO(X24) H-bonds are conserved, but theoverwhelming majority (92%) of NH(X14) to CO(X)H-bonds are conserved. On the other hand, the helical i

to i24 H-bond pattern disappears between residues X11to X13 and residues X23 to X21. The lack of thesebonds is observed even for conformations with small

bend angles as a1 and a2.

Table IMain Chain Main Chain Hydrogen Bonds Within the HXH Motif a

Conformation N

a-helical H-bonds Alternative H-bonds

NH(i ) CO(i24) Nbonds NH(i ) CO(j ) Nbonds

a1 212 X X24 194 X11 X24 10X11 X23 4 X12 X23 164X12 X22 0 X13 X22 12X13 X21 0 X13 X 13X14 X 177

a2 188 X X24 135 X21 X24 9X11 X23 1 X X23 42X12 X22 2 X11 X22 130X13 X21 0 X12 X21 25X14 X 157 X13 X 26

b1 218 X X24 183 X21 X24 18X11 X23 0 X X23 27X12 X22 0 X11 X22 1X13 X21 0 X13 X 43X14 X 168

b2 104 X X24 50 X21 X24 25X11 X23 0 X X23 44X12 X22 0 X13 X 6X13 X21 0X14 X 96

aL 37 X X24 24b X11 X24 14b

X11 X23 0 X X23 18c

X12 X22 0 Schellman motifd 9X13 X21 0 Partial Schellman motife 23X14 X 34

bM 78 X X24 74 X21 X24 2X11 X23 0 X X23 1X12 X22 0 X12 X21 1X13 X21 0 X13 X 8X14 X 65

aThe number Nbonds of H-bonds was determined with HBPLUS as described in Materials and Methods for the six conformations of the linker X. N represents the num-

ber of motifs for each conformation.bFor 10 of these H-bonds, interactions from NH(X) or NH(X11) to CO(X24) are equally probable.cFor 10 of these H-bonds, interactions from NH(X) to CO(X23) or CO(X24) are equally probable.dPresence of both NH(X11) to CO(X24) and NH(X) to CO(X23) H-bonds.ePresence of either NH(X11) to CO(X24) or NH(X) to CO(X23) H-bond.

J. Deville et al.

122 PROTEINS

Alternative interactions are possible for some confor-

mations. The a1 conformation is characterized by NH(i)to CO(i25) H-bonds, observed mainly between residuesX12 and X23 (77% of the a1 motifs). A few examplesof i to i25 H-bonds are also observed between positionsX13 and X22 when residue X13 is not a proline. Inmost cases, however, the amide group of residue X13 isnot involved in H-bonds with neighbor carbonyl groups.

Examples of NH(i) to CO(i23) H-bonds are commonlyfound in a2 motifs, especially from position X11to position X22. 70% of the a2 motifs are involved inthis interaction. NH(i) to CO(i23) H-bonds are alsoobserved from position X to position X23 in 42% of theb2 motifs. H-bonds from NH(X13) to CO(X) and fromNH(X) to CO(X23) can also be observed for the b1conformation (19 and 12%, respectively). For bM motifs,involvement of residues X23 to X21 and X11 to X13in main chain main chain H-bonds is very marginal.

The occurrence of Schellman motifs40 was checked for

the aL conformation. When a a-helix terminates by aCcap residue in the aL conformation, a characteristic H-bond pattern links the NH of the C0 and Ccap residuesto the CO of the C4 and C3 residues, respectively. The

occurrence of these two NH(C0) to CO(C4) andNH(Ccap) to CO(C3) H-bonds forms the so-called

Schellman motif.40,41 In the aL motifs, possibilities ofX11 to X24 and/or X to X23 H-bonds, typical of theSchellman motif, are observed in about 60% of the struc-

tures. Complete Schellman motif with both H-bonds is

however observed only in 25% of the HXH motifs in the

aL conformation, either in presence or not of glycine atposition X. An example of a Schellman motif with an

Arg linker is displayed in Figure 5(e).

Comparison with normal helices

Kinks with bend angles smaller than 408 are com-monly included within the DSSP definition of a-helices.Larger bend angles are infrequent but examples ofextreme distortion have been reported.42 We thussearched for reverse false positives (residues included inDSSP defined contiguous helices and having dihedralangles similar to those of the linker residue of the HXHmotif). The PDB_90S subset was used for this analysis toobtain the maximal limit of reverse false positives. Thesearch was carried out on helices with a length of at least11 residues, which corresponds to the minimal length ofthe HXH motifs. The central residue of each 11 residuelong window sliding from position N1 to position C1was considered, after ensuring that the other residues ofthe window were helical. This led to 99,920 residueswhose Ramachandran plot is shown in Figure 6. Com-parison with the Ramachandran plot of the linker resi-dues in HXH motifs [Fig. 4(a)] indicates the absence ofsignificant overlap or of reverse false positives for mostHXH conformations, except for the a2 and bM motifs.However, they correspond to two very different cases.

The tail of the a2 conformation (psi

resulting in a markedly bend motif [Fig. 5(b)]. It is note-

worthy that there is a continuum between these confor-

mations as the bend angle varies linearly with the psi/phi

dihedral angles.

Concerning the bM conformation, 36 residues with di-hedral angles in this area are found in contiguous helices.

We analyzed the corresponding structures. In 20% of these

structures, H-bonds between the NH group at position

X13 and the CO group at position X21 are detectedwith HBPLUS. In most cases, however, these motifs can-

not be distinguished from the bM motifs of our HXHdatabase. In particular, the dihedral angles of the sur-

rounding residues are very similar, with the preceding resi-

due markedly different from standard helix (phi 5 2988 208, psi 5 2188 168). Clearly, DSSP may be mis-taken by the very severe folding back of the protein chain

and underestimates the weight of the bM HXH motifs.Analysis of the DSSP data does not reveal any clear pat-

tern for rationalizing the different assignments.

The a1 conformation is clearly distinct from the con-formations accessible to central positions of a-helices,with no significant overlap [Figs. 4(a) and 6], albeit the

resulting bend angle is small (308 158). Visual inspectionof the a1 motifs reveals an opening of the helix at thisposition [Fig. 5(a)], leading to a bulged structure, in agree-

ment with the pattern of alternative H-bonds involving

residues X12 and X23 (Table I). Finally, only a fewreverse false positives are observed for the b1 conforma-tions, whereas the b2 and aL conformations are excludedfrom internal positions of DSSP-defined a-helices (Fig. 6).

Amino acid propensities

The propensities P of the different amino acids at and

around the linker position X were analyzed from position

X24 to X14 for the six geometrically defined conforma-

tions (Table II). To take into account the low expected

number of some amino acids, data in Table II are high-

lighted by their Z-scores. Five out of six conformations

have a characteristic glycine or proline residue with a

very high propensity (P > 6, corresponding to Z > 13).The glycine residue is located at the linker position X of

the aL and bM conformations, whereas the proline islocated either at position X11 of the b1 and b2 confor-mations or at position X13 of the a1 conformation. Inaddition to these hallmark residues, each conformation

has its own amino acid distribution.

More than forty percent of the HXH motifs in the a1conformation possess a proline located at position X13.Figure 5(a) displays such a HXH motif with proline

located three residues downstream the linker X. In addi-

tion, 9% of the a1 motifs possess a proline at positionX12. This conformation is the only one for which thelinker residue has a marked preference for hydrophobic

residues, especially Val and Phe. Positions X24, X11,and X14 also have a marked preference for hydrophobicresidues. Aromatic residues are overrepresented at posi-

tion X24 where they are observed in 24% of the a1motifs. Position X21 is polar with a high propensity forAsn. Other positions do not display clear tendencies.

The a2 conformation does not possess a specific over-whelming Gly or Pro residue. Proline can be observed at

position X12 with a propensity of 2.5, corresponding toonly 12% of the observations. The a2 linker has a highpropensity for Asn and His (P > 2.8). These amino acidsare favored at the Ccap position of a-helices.43,44 Anexample of a2 motif with an Asn residue at position X isshown in Figure 5(b). Positions X22, X21, and X12have a marked tendency for polar residues, whereas posi-

tion X11 displays a high propensity for Ala, Lys, andArg and position X14 has a marked preference forhydrophobic residues.

Both the b1 and b2 conformations have a high pro-pensity for proline at position X11. This position corre-sponds to the N1 position of helix 2 and to the C0 posi-tion of helix 1. The high propensity of proline for both

the N1 and the C0 positions has been widelyreported.15,4346 In these conformations, the dihedral

angles of the X linker correspond to values accessible to

pre-Pro residues.36,37 Fifty-six percent of the b1 motifshave a proline at position X11. An example of a b1motif with a proline is displayed in Figure 5(c). The

linker residue has a high propensity for Asp, Asn, His,

and Tyr (Z > 2.6) and Ser (Z > 2.0), consistent withpropensities for Ccap positions in the b-strand confor-mation.46 These five residues are found at position X in

two thirds of the b1 motifs. Positions X22, X21, andX12 have a marked tendency for polar residues, whereaspositions X24, X23, X13, X14 are preferentially hydro-phobic. When only b1 motifs without proline at positionX11 are considered, this position has a preference forbulky charged or polar residues, specially Lys (P 5 2.9).

Figure 7Typical translation of the helix axis in a22 motifs. The HXH motif (shown as aribbon) corresponds to residues 140170 of 5CSM chain A, X 5 Phe-160, yb 5108, yw 5 618. The phi and psi angles of the linker are 2728 and 2108,respectively.

J. Deville et al.

124 PROTEINS

Table IIAmino Acid Propensitiesa

aThe amino acid propensities were calculated from position X24 to position X14 for the six conformations of the linker X as described in Materials and Methods.Hallmark residues (Z > 13) are highlighted in yellow, favorable residues with Z > 2.0 and 2.6 are highlighted in light green and dark green, respectively, and disfavoredresidues with Z < 22.0 and 22.6 are highlighted in pink and red, respectively.

Prolines are present in 32% of the b2 motifs at the posi-tion X11. Glu has also a high propensity for this positionand is observed in 18% of the b2 motifs. The b2 linker hashigh propensities for Ser, Asp, and Asn (Z > 2.6), typicalof Ncap positions.15,43,45,47 These three residues are

found at position X in 68% of the b2 motifs, whereas ali-phatic residues are uncommon and aromatic residues rig-

orously excluded at this position (no case out of 188). Thr

that is usually favored at the Ncap position of a-helices isnot favored at the linker position (P 5 0.5). This behaviorof Thr is also observed for the other conformations.

In spite of similarities in the amino acid propensities

at the linker position, the b1 and b2 conformations dis-play a phase shift in the hydrophobicity of the residues

surrounding the kink, observable at positions X24, X22,and X14. In b1 motifs, positions X24 and X14 arepreferentially hydrophobic, whereas position X22 is pref-erentially polar. In b2 motifs, position X24 displays apreference for polar residues or Ala and position X14for Arg and Gln which are present in 40% of these

motifs, whereas position X22 has a preference for hydro-phobic residues.

Both the aL and bM conformations have a very highpropensity for glycine at the linker position. This residue

is observed in 54% of aL motifs and in all the bM motifsexcept one. This single exception is a Gln residue (posi-

tion 369) in the isocitrate lyase from Mycobacterium

tuberculis48 (PDB access number: 1F8M, chain A). This

high propensity for glycine is also observed in the bMreverse false positives (34 cases out of 36). The bM regionis highly specific for glycine residues, while the aL regionhas a high propensity for glycine but can accommodate

other residues, in agreement with literature data.36,37,49

The number of aL motifs (37) is too low to have high Z-scores and levels of significance. However, qualitative

comparison of amino acid propensities for aL and bMmotifs indicates a marked difference in the hydrophobic-

ity pattern of these motifs. In aL motifs, positions X21and X14 have a preference for polar residues and posi-tion X12 for hydrophobic residues. On the other hand,in bM motifs, position X21 is preferentially hydrophobicwith high propensities for Leu and Phe (P 5 2.3 and 2.9,respectively), but Lys, whose part of the side chain is ali-

phatic, can also be observed. Hydrophobic residues and

Ala are favored at position X14 (P 5 2.7, 2.1, and 3.2for Ala, Leu, and Phe, respectively), whereas position

X12 has a marked polar character with high propensity(P > 3) for negatively charged residues.

Hydrophobicity profiles

To further compare the different conformations, we

analyzed the average hydrophobicity profiles from posi-

tion X24 to position X14, using Eisenbergs consensusscale (see Fig. 8). The use of different scales did not lead

to significant differences in data (not shown). For this

analysis, the a1, b1, b2, and aL conformations wereshared in two groups depending upon the presence or

not of proline at position X11 (b1 and b2) or X13 (a1)or of glycine at position X (aL). This led to the a11 anda1- subsets (121 and 97 motifs, respectively), to the b11and b1- subsets (121 and 97 motifs, respectively), to theb21 and b2- subsets (33 and 71 motifs, respectively) andto the aL1 and aL2 subsets (20 and 17 motifs, respec-tively). In addition, we shared the a2 motifs in two equalsubsets by the median of ub (558). This parameterappeared as the most pertinent because of its linear rela-

tionship with the phi/psi dihedral angles. This led to two

a2inf and a2sup subsets (94 motifs each one) with averageub of 338 158 and 728 108, respectively, allowing totest the homogeneity of this conformation. As previously

noted, the a1 motif is mainly hydrophobic or neutral[Fig. 8(a)]. The presence or not of proline at position

X13 has, however, a marked effect on the hydrophobic-ity at positions X21 and X11 (99% significance level forv2 homogeneity test). These positions have polar andhydrophobic characters, respectively, in the presence of

proline, whereas no clear tendency is observed in its ab-

sence. The presence of proline at position X11 does notalter significantly the hydrophobicity profile of the b1and b2 motifs [Fig. 8(c,d)]. The only exception isobserved at position X24 of the b1 motif that is mark-edly hydrophobic in the presence of the Pro. For the

Figure 8Average hydrophobicity index H from position X24 to position X14 for the a1(a), a2 (b), b1 (c), b2 (d), aL (e), and bM (f) motifs. Open bars represent thea11 motifs with proline at position X13, the b11 and b21 motifs with prolineat position X11, the aL1 motifs with glycine at position X and the a2inf motifswith yb < 558. Closed bars represent the a12 motifs without proline at positionX13, the b12 and b22 motifs without proline at position X11, the aL2 motifswithout glycine at position X and the a2sup motifs with yb > 558. The 1F8Mstructure with X 5 Gln was removed from the bM data set.

J. Deville et al.

126 PROTEINS

other positions (except the Pro position), differences

observed in the average hydrophobicity index H do not

correlate with significant changes in the distribution of

hydrophobic or polar residues.

Major differences in the hydrophobicity of the a2infand a2sup subsets are observed, especially for positionsX22 and X12 to X14 (99% confidence level). Thehydrophobicity profile of the a2inf subset is similar tothat observed for the a12 subset with only position X21exhibiting a polar character. By contrast, the hydrophobic-

ity profile of a2sup indicates that the positions surroundingthe X linker from position X22 to X12 are polar. This iscorroborated by the detailed analysis of the amino acid

distribution from position X24 to X14 (not shown). Inparticular, at position X, the propensities of His and Asn

raise from 0.47 and 1.21, respectively, for the a2inf motifs,to 5.2 and 5.6, respectively, for the a2sup motifs.The polar character at and around the linker X of the

a2sup subset is also observed for the b1 motifs (positionsX22 to X12) and for the b2 motifs (positions X toX12), either in the presence or not of proline (Fig. 8and Table II). The average hydrophobicity index at posi-

tion X of the b11 motifs has a moderate polar characterdue to the increased propensity of Tyr and Phe in that

case (3.0 and 1.9, respectively, vs. 1.4 and 1.3 in the ab-

sence of proline). Similarly, the average hydrophobicity

index of position X21 for the b21 motif is related to thehigh propensity of Ala (P 5 3.4 in the presence of proline)and underestimates the polar character of this position.

The length of the polar stretch decreases, however, from

the a2sup to the b1 and b2 conformations. In either case,position X23 is hydrophobic. This residue is followed by apolar position in the a2sup and b1 motifs and by a hydro-phobic position for b2 motifs. On the other hand, positionX14 has a marked hydrophobic character in a2sup motifsand polar character in b2 motifs. This corroborates the dif-ferences in the phases of the helices observed by analyzing

the amino acid propensities (Table II).

Comparison of the hydrophobicity profiles of the aL1and aL2 motifs does not lead to significant differences(except for the Gly position), because of the small number

of observations. On the other hand, marked differences are

observed between the aL1 and bM motifs for positionsX21, X12, and X14 (99% confidence level). This reversedhydrophobicity corroborates the observations drawn for

amino acid propensities [Fig. 8(e,f) and Table II].

Solvent accessibility

The average accessible solvent area (ASA) of the resi-dues located from position X24 to position X14 of theHXH motifs was measured for the different subsets (seeFig. 9). The most buried motifs are the a12 ones. In thatcase, the average solvent accessibility is lower than 50%for any position of the motif, in agreement with thehydrophobicity profile indicating low polarity [Fig. 8(a)].

The a2inf motifs are slightly more solvent exposed withthe average ASA raising to 60% for positions X21 andX12. Helix 1 is more solvent exposed in the a11 motifsthan in a12 motifs, with an average ASA of 75% at posi-tion X21, in agreement with the polar character of thisposition. This increased solvent accessibility is notobserved for helix 2 (ASA 40%).The a2sup, b1, and b2 motifs are very solvent exposed

with accessibilities 80% for individual positions. Thepresence of proline at position X11 of the b1 and b2motifs does not alter the solvent accessibility [Fig.9(c,d)]. Both helices have highly solvent exposed posi-tions. Position X21 is the most solvent exposed positionof helix 1 for the a2sup, b1, and b2 motifs, whereas a dif-ference is observed for helix 2 whose the most solvent

exposed position is either X12 for the a2sup and b1motifs or X11 for the b2 motifs [Fig. 9(bd)]. These sol-vent accessibilities are consistent with the hydrophobicity

profiles and amino acid propensities indicating that polar

residues or Ala are favored at these positions. It is worth

to note that, in spite of the polar character of the linker

residue X in any of these three conformations, this resi-

due has a limited solvent accessibility ( 558.


PROTEINS 127

Concerning the aL and bM conformations, the low ac-cessible surface area of glycine is related to the absence of

side chain. This caveat leads to very different ASA for

position X of the aL1 and aL2 subsets (20 and 73%,respectively) that are not related to changes in the solvent

exposed location of this residue on the protein surface.

Either the aL or the bM conformation leads to a break inthe helical pattern of helix 1, with exposure of the linker

residue [see Fig. 5(e,f) as examples]. Major differences

between the two motifs come from the second helix. In

the aL motifs, helix 2 initiates from the buried side,whereas in bM motifs, helix 2 initiates from the exposedside, leading to very different solvent exposures at posi-

tions X12 and X14 [Fig. 9(e,f)], in agreement with thehydrophobicity profiles [Fig. 8(e,f)].

Side chain main chain hydrogen bonds

Side chains of polar and negatively charged residues

(Asn, Asp, Gln, Glu, Ser, Thr, and His) can be involved

in the formation of closed-loop conformations through

side chain main chain hydrogen bonds.50 They have

the capability to form C10- to C17-membered ring con-

formations, through H-bonding of the side chain oxygen

or nitrogen to the backbone polar groups of residues

located up to four positions upstream or downstream.

We thus screened the HXH motifs in search of H-bonds

between polar or negatively charged side chains of the

linker X and polar groups of neighbor backbone (Table

III). As previously observed,50 side chains acting as

donor can form a H-bond only with the carbonyl oxygen

Table IIIH-Bonds Between the Linker Side Chain and the HXH Motif Main Chain a

Residue Donor/acceptor Conformation N Nbonds Acceptor/donor

Asn Od1 a1 9 0Nd2 a2 28 17 17 O(X24)

b1 41 31 14 O(X24), 4 N(X12), 13 N(X13)b2 19 19 1 O(X23), 3 N(X12), 15 N(X13)aL 1 0

Asp Od a1 4 0a2 7 0b1 44 36 11 N(X12), 25 N(X13)b2 30 29 6 N(X12), 23 N(X13)aL 2 2 2 N(X13)

Gln Oe1 a1 12 1 1 O(X21)Ne2 a2 6 1 1 N(X12)

b1 9 1 1 N(X12)b2 2 0aL 2 1 1 O(X23)

Glu Oe a1 13 0a2 7 0b1 6 0b2 4 1 1 N(X13)aL 1 0

Ser Og a1 12 4 4 O(X24)a2 8 7 3 O(X24), 4 O(X23)b1 22 19 1 O(X24), 1 N(X12),17 N(X13)b2 22 20 1 O(X23), 19 N(X13)aL 0 0

Thr Og1 a1 13 6 5 O(X24), 1 O(X23)a2 6 5 2 O(X24), 3 O(X23)b1 3 2 2 O(X24)b2 3 3 3 N(X13)aL 0 0

His Nd1 a1 9 0Ne2 a2 12 5 5 O(X24)

b1 18 6 5 O(X24), 1 N(X12)b2 5 3 2 N(X12), 1 N(X13)aL 2 0

aThe number Nbonds of H-bonds was determined with HBPLUS as described in Materials and Methods for the indicated atoms of polar side chains at position X of

HXH motifs and neighbor polar groups of the protein backbone. O(X2i) indicates a H-bond with the carbonyl group at position X2i. N(X1j) indicates a H-bondwith the amide group at position X1j. N represents the number of amino acids a at position X in the considered conformation.

J. Deville et al.

128 PROTEINS

of upstream residues whereas side chains acting as

acceptor can interact only with amide nitrogen of down-

stream residues.

Gln and Glu residues located at position X are seldom

involved in side chain main chain H-bonds (3%),

although these residues can be involved in such H-

bonds.50 For Asn, Asp, Ser, Thr, and His, the H-bonding

pattern depends upon the conformation of the linker

(Table III). The percent of polar side chains involved in

H-bonds raises from less than 15% for aL and a1 motifsup to 80% for b2 motifs.In a1 motifs, only Thr or Ser at position X can be

involved in H-bond interactions. These H-bonds involve

only carbonyl groups, mainly at position X24 (9 casesout of 10). The H-bond linking the Ser/Thr side chain to

the X24 carbonyl is typically observed for Ser or Thrresidues located within a-helices.50 Such an interactioninvolving a Thr side chain and the carbonyl group at

position X24 is shown in Figure 5(a). When they arepresent at position X of a1 motifs, Asp, Asn, and His arenot involved in H-bonds.

Ser and Thr are not favored at position X of the a2motifs (Table II). However, when present, most of them

(80%) are involved in H-bonds with either the X24 orX23 carbonyl groups. The interaction between the Ser/Thr side chains and the carbonyl groups at position X23is typical of helix C-terminus.50 In addition, Asn and

His, which have high propensities for position X of a2motifs, can interact with the carbonyl group of residue

X24. These latter interactions are also typical of the C-terminal end of a-helices.50 Two thirds of the Asn resi-dues at position X of a2 motifs are involved in such H-bond interactions. An example of this interaction is given

in Figure 5(b). Asp, which cannot form this H-bond, is

seldom present in the a2 conformation (P 5 0.65). Noexample of interaction of Asn or Asp side chains with

the amide group at position X13, typical of N-capping,is observed.

In b1 motifs, polar side chains present a different H-bonding pattern and can be involved in interaction with

either the carbonyl groups of residue X24 (Asn, Ser,Thr, and His) or with the amide groups of residues X12or X13 (Asn, Asp, Gln, Ser and His). When present,most Ser side chains (77%) are involved in H-bonds

with the amide group at position X13. An example ofthis interaction is given in Figure 5(c). Thr is seldom

present at position X of the b1 conformation but twoout of the three cases are involved in H-bonds with the

carbonyl groups at X24. Either in b11 or b12 motifs,about 80% of Asn and Asp side chains are involved in

H-bonds. Interestingly, the presence of proline favors H-

bonding of Asn with carbonyl groups (10 cases out of

16) whereas its absence favors H-bonding with amide

groups (11 cases out of 15) (Table IV). Asp is involved

Table IVEffect of Proline on H-Bonds Between the Linker Side Chain and the HXH Motif Main Chaina

Residue Conformation Pro N P Nbonds Acceptor/donor

Asn b1 2 21 4.9 15 4 O(X24), 1 N(X12), 10 N(X13)1 20 3.8 16 10 O(X24), 1 N(X12), 5 N(X13)

b2 2 16 5.2 16 1 O(X23), 2 N(X12), 13 N(X13)1 3 2.1 3 1 N(X12), 2 N(X13)

Asp b1 2 27 4.9 22 4 N(X12), 18 N(X13)1 17 2.5 14 4 N(X12), 10 N(X13)

b2 2 19 4.7 18 18 N(X13)1 11 5.8 11 4 N(X12), 7 N(X13)

Ser b1 2 13 2.1 13 1 O(X24), 12 N(X13)1 9 1.2 6 1 N(X12), 5 N(X13)

b2 2 18 4.0 17 1 O(X23),16 N(X13)1 4 1.9 3 3 N(X13)

Thr b1 2 0 0.0 01 3 0.4 2 2 O(X24)

b2 2 1 0.2 1 1 N(X13)1 2 1.0 2 2 N(X13)

His b1 2 6 2.7 1 1 O(X24)1 12 4.4 5 4 O(X24), 1 N(X12)

b2 2 2 1.2 01 3 4.0 3 2 N(X12), 1 N(X13)

aThe number Nbonds of H-bonds was determined with HBPLUS as described in Materials and Methods for the indicated polar side chains at position X and neighbor

polar groups of the protein backbone in b1 and b2 motifs, as a function of the presence (1) or not (2) of proline at position X11. Propensities P with Z > 2.0 are inbold font. N represents the number of amino acids a at position X in the considered conformation. The total numbers of b11 and b12 motifs are 121 and 97, respec-tively. The total numbers of b21 and b22 motifs are 33 and 71, respectively.


PROTEINS 129

in H-bonds with amide nitrogens, either at position X12or X13. About 30% of His are involved in H-bond withcarbonyl groups at X24. Four cases out of five for thisinteraction are observed for b11 motifs (Table IV).Either in the presence or not of proline, polar residues

Asn, Asp, Ser, Thr, and His are present in about 70% of

the b2 motifs and more than 90% of their side chainsare involved in H-bonds (Tables III and IV). These H-

bonds involve almost exclusively the NH groups of resi-

due X12 or X13. Only two out of 75 examples of H-bonds involve carbonyl groups. Most H-bonds involving

the amide group at position X12 are observed for b21motifs (7 cases out of 11), in spite of the limited number

of these motifs (Table IV). This effect of proline is espe-

cially marked for Asp which can interact with amide

groups either at position X12 or X13 (4 and 7 cases,respectively) in b21 motifs whereas only H-bonds withamide groups at position X13 are observed in b22motifs (18 cases).

DISCUSSION

The helix-X-helix motif

One of the difficulties in analyzing the secondary

structure of proteins relies on the definition of SS ele-

ments, in particular when the attention is focused on the

limits of these elements. Different algorithms, based on

H-bond pattern, Ca geometry, backbone dihedral angles,or a combination of different criteria have been devel-

oped.20,5155 The analysis of the HXH motif is depend-

ent upon the definition of the SS elements. For example,

in their analysis of a-a linking motifs, Engel andDeGrado17 used a broad phi/psi based definition of a-helix with psi ranging up to 458 and thus could notobserve the a1 and a2 conformations that we have deter-mined.

In this article, we relied on the DSSP definition of SS

elements.20 DSSP is based on the detection of H-bonds

and was developed to define SS elements in which not all

possible H-bonds are formed, for example in bended or

curved helices.20 In addition, we added a phi/psi filter to

insure that the five residues surrounding the linker were

in a helical conformation and to remove false positive

motifs in which one of the helices was not correctly

defined. This filter suppressed also potential problems

with the definition of the helix termini. As a matter of

fact, in 3% of the HXH motifs initially found with DSSP,

the C-terminus of helix 1 or the N-terminus of helix 2

were out of the a region. These structures correspondedto helices linked by two residues and were erroneously

assigned as HXH motifs by DSSP.

Analysis of the dihedral angles of residues located in

the middle of a-helices (see Fig. 6) indicates that the bMconformation is the only one for which a significant

number of reverse false positives can be found. Although

the reasons are not clear, DSSP does not cope well with

the extreme distortion of the protein chain observed in

these motifs. This is not the case for the other conforma-

tions. Only a few cases of reverse false positives are

observed for the b1 conformation and none for the aLand b2 conformations. The dihedral angles of the a1 anda21 motifs, albeit included in the additional allowed a-helical region, can only marginally be accessed by resi-

dues located in the middle of contiguous helices (see Fig.

6). Finally, detailed analysis of the a22 motifs indicatethat, albeit the dihedral angles of the linker overlap the

a-helix area, they are clearly distinct from kinks includedin contiguous a-helices (see Fig. 7) and correctlyassigned as HXH motifs by DSSP.

The precise determination of the H-bond pattern of

the HXH motif was carried out with HBPLUS (Table I).

This motif is characterized by the lack of NH(i) to

CO(i24) hydrogen bonds between the three residuesdownstream and upstream the linker residue. The linker

residue X is usually involved in H-bond interactions both

with residue X24 and residue X14 but there is completedisruption of the helical pattern between residues N1N3

of helix 2 and residues C3C1 of helix 1.

The limited number of motifs found in PDB_25 led us

to develop an alternative strategy to build a HXH data-

base (see Fig. 1). Although this may introduce some bias

in the quantitative results, several lines of evidence

strongly suggest that the procedure should not signifi-

cantly affect the general conclusions of this study: (1)

sequence diversity is high, with 94% of the sequence

pairs having a sequence identity 27% (to be comparedto 96% for the PDB_25 set); (2) the relative weights of

the six conformations are very similar in both data sets;

(3) sequence properties can be rationalized by energetic

considerations, in relation with the specific structural

properties of these motifs.

A striking property of the HXH motifs is their solvent

exposure (see Fig. 9). Disruption of the helical H-bond-

ing pattern makes several polar groups of the protein

backbone free (Table I). The high solvent exposure of

most motifs may be related to this property and required

for energetic reasons. The a1 and a2inf motifs, for whichalternative H-bonds are the most frequent, are the most

buried ones (see Fig. 9). Other motifs are solvent

exposed. However, the linker residue, in spite of its usual

polar character, is more buried than its neighbors in

most conformations (Figs. 6 and 7). This is consistent

with its H-bonding properties. First, the amide and car-

bonyl groups of this residue are involved in main chain

main chain H-bonds with residues X24 and X14 (TableI). This makes easier the burying of the backbone at this

position. Second, polar side chains at position X are

involved in H-bonds with neighbor polar groups of the

protein backbone and may contribute to stabilize the kink.

The propensities of Asn and Asp at position X are

very consistent with the H-bonding patterns. Both Asn

J. Deville et al.

130 PROTEINS

and Asp have high propensities in b1 and b2 motifswhere their side chain can interact with downstream

amide groups, indicating that these N-capping interac-

tions are stabilizing. Similarly, the high propensity of Asn

in a2 motifs can be related to C-capping interactionswith upstream carbonyl groups. His has also a high pro-

pensity at position X in the a2, b1, and b2 motifs whereit can form H-bonds. Possibility of H-bonds between

Ser/Thr and carbonyl groups in a1 and a2 motifs is notrelated to an increased propensity of these residues (P 1). However, these H-bonds are concurrent of the main

chain main chain H-bonds involving the X24 andX23 oxygens (Table I) and thus may not be stabilizing.On the other hand, favorable propensity for Ser is

observed when it can be involved in interactions with

amide groups (P 5 1.6 and 3.3 for the b1 and b2 confor-mations, respectively), indicating that these N-capping

interactions stabilize these motifs.

Proline-induced kinks

Proline is rigorously excluded from the linker position

X and the preceding positions in HXH motifs but is

found at positions X11 to X13 in almost forty percentof these motifs, enlightening the role of this residue as

helix breaker. Proline has an average propensity of 1.4 at

position X12 and is observed in 6% of the motifs,mainly for the a1 and a2 conformations. Most proline-induced kinks, however, correspond either to a a1 motifwith proline at position X13 (12% of the 837 motifs) orto a b1 or b2 motif, with proline located at positionX11 (20% of the 837 motifs).Because of the bulkiness of the pyrrolidine ring, the

backbone of the residue preceding a proline can adopt

only a limited range of dihedral conformations. This

includes a narrow range in the a region and two subsetsof the b region corresponding to the b1 and b2 areadefined in this study.36,37,49 When proline residues are

found in the middle of a-helices, steric constraints leadto a kink in the helix to avoid a clash between the Cdatom of Pro at position i and the carbonyl oxygen at

position i24. These kinks have bend angles in the 208308 range, with an average value of 268,56 similar to theaverage bend angle of the a1 motifs. Analysis of the dihe-dral angles of the residues located around the proline

reveals that, when proline is included in a contiguous he-

lix, the distortion of the backbone is shared by the resi-

dues located two positions (phi 5 2808 128, psi 52308 118) and three positions upstream the proline(phi 5 2778 138, psi 5 2358 118). On the otherhand, the a1 motifs correspond to a large distortion ofthe residue located three positions upstream the proline

(phi 5 21228 98, psi 5 2568 78) and, to a lesserextent, of the preceding residue (phi = 918 118; psi =238 118), whereas the following residue is not affected(phi 5 2648 68, psi 5 2528 78). In spite of these

differences in the dihedral angles, the hydrophobicity

patterns of the residues surrounding the proline are simi-

lar, with positions P+2 and P2 having a marked hydro-

phobic character in both cases.

When proline is present in b1 or b2 motifs, the linkerconformation corresponds to one of the two conforma-

tions accessible to pre-Pro residues in the b area36,37,49

and proline is located at the next position. The linker is

mainly in the b1 conformation (78% of the motifs withproline at X11) which allows a dramatic change in thehelix orientation with bend angles larger than 908. Theb1 conformation corresponds to the well described ProC-capping motif.57 This conformation allows a stabiliz-

ing electrostatic interaction of residues X and X11 withthe helix dipole. The high propensity of His and aro-

matic amino acids at position X for this motif in the

presence of Pro (4.4, 1.9 and 3.0 for His, Phe, and Tyr,

respectively) is consistent with stabilization of the Pro C-

capping motif by interaction of these rings with the car-

bonyl group located 4 residues upstream.57 The presence

of H-bonds between His at the linker position and the

carbonyl group at position X24 (Table IV) provides anadditional evidence of such interaction.

Analysis of the amino acid distributions and of the

hydrophobicity patterns (Table II and Fig. 8) strongly

suggests that position-specific scoring matrices could be

used to predict the backbone conformation of the pre-

Pro residues in proline-containing sequences (see Materi-

als and Methods). The matrix for pre-Pro residues in the

a conformation (a matrix) was built from the sequencesof the a1 motifs and of Pro-containing helices inPDB_25 (89 and 85 sequences, respectively). The matrix

for pre-Pro residues in the b conformation (b matrix)was built from the sequences of the b1 and b2 motifs(121 and 33 sequences, respectively). The limited number

of b2 motifs did not allow considering them separately.In either case, the window ranged from five residues

upstream to three residues downstream the proline. Pre-

dictions were based on a 10-fold, cross-correlation proce-

dure. The Q-score matrix is shown in Table V and corre-

sponds to an average accuracy of 0.81 0.06. Indeed, upto 85% of sequences with the pre-Pro residue in the bregion were predicted successfully, clearly indicating the

usefulness of these position-specific scoring matrices for

prediction purpose.

Glycine-induced kinks

Thirteen percent of the HXH motifs have a glycine at

the linker position. This corresponds to an average pro-

pensity of 1.7. However, Gly is seldom observed in a1,a2, b1, and b2 motifs and has a propensity P < 1 atposition X of these conformations (Table II). In most

cases, the dihedral angles of glycine linkers are either in

the bM or in the aL conformation, characterized by posi-tive phi values. In addition to its high propensity for the


PROTEINS 131

Ccap position of a-helices, glycine is known to have ahigh propensity for the C0 position.43,45,47 However, afavourable propensity for Gly at position X11, corre-sponding to the C0 position of helix 1, is not observed inany of the conformations (the value of 1.4 in aL motifsis not significant, due to the limited number of observa-

tions).

Glycine is characterized by the absence of side chain

that allows its backbone dihedral angles to experience a

much broader range than for other residues.36,37,49 Its

phi dihedral angle can be positive and glycine can access

either the aL or the bM regions of the Ramachandranplot, corresponding to mirror regions of the a-helix orof the b-strand, respectively. In particular, the bM regionis very specific of Gly residues.36

Termination of a-helices by a glycine residue in the aLconformation at the Ccap position is commonly observed

in proteins.40,41,46 This conformation allows the forma-

tion of the Schellman motif,40,41 which involves two

main chain main chain hydrogen bonds joining

NH(C0) to CO(C4) and NH(Ccap) to CO(C3). The pro-pensity of Gly in aL motifs, 7.1, correlates well with thepropensity of 8 observed for Gly at the C-cap position

when this one is in the aL conformation.46 However, aLmotifs represent only 20% of the HXH motifs with a Gly

at the linker position. This is in agreement with the lim-

ited number of aL-terminated helices followed by a sec-ond helix initiating at the next position (2%).46

In most cases (80%), when a second helix initiates af-

ter a glycine, the dihedral angles of the glycine linker are

in the bM conformation. This dihedral conformation hasbeen recently described as a glycine specific conforma-

tion.37 To the best of our knowledge, this specific motif,

in which two a-helices are linked by a Gly residue in thebM conformation, has not been described yet. This con-formation allows large bend angles (1248 188) and aclockwise wobble rotation. It is the only conformation

allowing such wobble rotation, reversed as compared to

the other conformations. The weight of the bM motifs is

underestimated by DSSP (see Fig. 6). Nevertheless, they

represent 9% of all the HXH motifs in our database,

enlightening their structural importance.

Because of the very limited number of Gly-containing

aL1 motifs (20 cases), position-specific scoring matricescould not be developed for quantitative prediction of the

linker conformation. However, comparison of the amino

acid propensities (Table II) and hydrophobicity profiles

(see Fig. 8) of aL and bM motifs clearly shows thereversed hydrophobicity of the helix following the glycine

linker, especially at positions X12 and X14, and suggestsrules of the thumb to differentiate these motifs. In partic-

ular, in bM motifs, the residue located at position G14is hydrophobic or Ala and stabilizes the motif by interac-

tion with hydrophobic residues in helix 1 (residues G25and/or G24). Alanine, with its small size, favors thefolding back of the protein backbone [Fig. 5(f)] and is

usually observed for high bend angles (ub = 1348 88).This position is polar and solvent exposed in aL motifs.It is also interesting to note the reversed polarity of the

position preceding the glycine. Although partly exposed

on the protein surface in both cases, position G21 has apreference for hydrophobic residues in bM motifs and forpolar residues in aL1 motifs. In particular, Ser is foundat position G21 in 30% of the aL1 motifs (propensityof 4.6).

Non-Gly, non-Pro motifs

In spite of the high number of proline or glycine

induced kinks, about half the HXH motifs involve nei-

ther proline nor glycine residue. Among these motifs, the

most frequent one is a2 (40%) and the less frequent oneis aL (5%). The a1, b1 and b2 motifs have a weight inthe 1520% range. The a2 motif is the only one withouta characteristic proline or glycine residue. It has however

high propensities for Asn and His at position X (Table

II). These propensities are dependent upon the bend

angle and these two residues represent 10 and 36% of

linker residues when ub is lower and higher than 558,respectively. These residues may be involved in H-bond

interactions with the carbonyl group at position X24(Table III). These H-bonds are typical of C-capping sta-

bilization.46 The increased propensity of His and Asn in

a2sup motifs indicates that the C-capping H-bondsinvolving these residues stabilize the a2 motifs with highbend angles.

Either in the b1 or b2 conformation, the absence ofproline at position X11 is usually correlated with anincreased propensity of Asn, Asp, and Ser. The only

exception is Asp in the b2 conformation whose propen-sity is not significantly altered by the presence of Pro

(Table IV). These three residues represent 38 and 55% of

the b11 and b21 linkers, respectively, and 63 and 75% ofb12 and b22 linkers. The presence of proline does notalter the percent of these residues involved in H-bonds

Table VConformational Predictionsa

Conformation Qobs Qpred

Proline-induced kinksb a 0.78 0.07 0.86 0.06b 0.85 0.07 0.77 0.06

NonP, nonG motifsc a2sup 0.87 0.13 0.89 0.07b2 0.86 0.10 0.85 0.13

aThe conformation of the pre-Pro residue in proline-induced kinks or of the

linker residue in nonPro, nonGly HXH motifs was predicted as described in

Materials and Methods. The Q-scores were determined by a 10-fold cross correla-

tion procedure.bThe position-specific scoring matrices included the five positions preceding the

proline residue and the three positions following it. In the a pre-Pro conforma-tion, proline could be in a1 motifs or in contiguous helices. In the b pre-Pro con-formation, proline could be either in b1 or b2 motifs.cThe position-specific scoring matrices included the four positions surrounding

the linker residue.

J. Deville et al.

132 PROTEINS

(80 and 95% for the b1 and b2 motifs, respectively).However, it alters the H-bond patterns (Table IV). In

particular, the preference for H-bonds involving the am-

ide group at position X13 is more marked in b12 andb22 motifs (80 and 92%, respectively) than in b11 andb21 motifs (56 and 71%, respectively) (Table IV). TheseH-bonds are typical of helix N-terminus15,50,5860 and

appear to stabilize b1 and b2 motifs, especially in the ab-sence of proline.

Asn has unique properties as helix breaker. Its propen-

sity at the linker position is 5.6, 4.9, and 5.2 for the

a2sup, b12 and b22 motifs, respectively. These high pro-pensities can be related to the H-bond pattern of its side

chain. It may be involved in C-capping stabilizing inter-

actions in the a2 conformation or in N-capping stabiliz-ing interactions in the b1 or b2 conformations. As theAsp side chain can be involved only in H-bond interac-

tion with upstream amides, it is seldom observed in the

a2 motifs but displays high propensity in the b12 and b22motifs. On the other hand, although Ser can be involved in

C-capping interactions in a2 motifs, it has a low propensity(0.75) in this motif. Its high propensity in the b12 andb22 motifs appears to be related to its capability to beinvolved in N-capping interactions.

Comparison of the hydrophobicity patterns of the

a2sup, b12 and b22 motifs [Fig. 8(c)] enlightens the gen-eral properties of HXH motifs when a large bend angle is

observed between two a-helices in the absence of prolineor glycine breakers. In the three cases, the linker has a

marked polar character and is located within a polar

stretch. Position X23 is buried and helix 1 ends at itssolvent exposed side. In b12 and b22 motifs, helix 2 ini-tiates from the solvent exposed side. In a2sup motifs, thesolvent exposed side of helix 2 starts at position X12. How-ever, position X11 displays a preference for either polar res-idues or alanine. This leads to a characteristic five residue

long polar stretch (which may include Ala at position

X11), which appears as a hallmark of the a2sup motifs.The a2sup and b22 motifs correspond to kinks with

bend angles in the same 5081008 range, but with differ-ent wobble values [Fig. 4(b)]. A tool able to discriminate

between these two conformations should be very useful

both for protein modeling and protein design. We thus

tested the capability of the position-specific scoring mat-

rices to differentiate them. Accuracy of the prediction

reached 85% (Table V), indicating that this method may

be used to estimate the wobble motion of kinks with

bend angle in the 5081008 range.

CONCLUSIONS

The present work describes the first systematic descrip-

tion of a structural motif characterized by two helices

linked by a single residue. This motif is commonly found

in soluble proteins and about 10% of the proteins pos-

sess such a HXH structure. Most importantly, only a few

backbone conformations are allowed at the linker posi-

tion, leading to a classification of these kinks in six

classes with characteristic amino acid distributions. The

a1 conformation is mainly limited to small bend angles(ub < 308), and displays a high propensity for proline atposition N3 of helix 2. Larger amplitude kinks are usu-

ally located either at Gly, Ser, Asp, or Asn residues or at

positions preceding proline residues. Bend angles larger

than 908 can be obtained through the bM or the b1 con-formations and are usually related either to the presence

of a glycine at the linker position or to the presence of a

proline at the following position. It is noteworthy that,

when a glycine residue is located between two a-helices,the unconventional bM conformation is more frequentthan the aL conformation.The analysis of the HXH motifs developed here should

provide useful information both for molecular modeling

and for de novo design of protein structures. Among pos-

sible applications in the modeling field, it should contrib-

ute (1) to determine correctly the location of the putative

kink between two helices when SS predictions lead to an

unrealistic long helix and (2) to determine the relative

orientation of the two helices when the kink position is

determined. In the protein design field, it will be possible

to test the compatibility of a sequence with specific con-

formations, especially for Pro-induced kinks. The posi-

tion-specific scoring matrices that we developed should

be particularly useful for this purpose.

ACKNOWLEDGMENTS

We thank NEC Computers Services SARL (Angers,

France) for the kind availability of a multiprocessor

server. We thank D. Thybert for the clustering algorithm.

J.D. was supported by fellowships from INSERM-Region

des Pays-de-la-Loire and from the Association pour la

Recherche sur le Cancer (ARC). J.R. is supported by a

fellowship from CNRS.

REFERENCES

1. Murzin AG, Brenner SE, Hubbard T, Chothia C. SCOP: a structural

classification of proteins database for the investigation of sequences

and structures. J Mol Biol 1995;247:536540.

2. Orengo CA, Michie AD, Jones S, Jones DT, Swindells MB, Thorn-

ton JM. CATHa hierarchic classification of protein domain struc-

tures. Structure 1997;5:10931108.

3. Chou PY, Fasman GD. Prediction of protein conformation. Bio-

chemistry 1974;13:222245.

4. Chou PY, Fasman GD. Conformational parameters for amino acids

in helical, beta-sheet, and random coil regions calculated from pro-

teins. Biochemistry 1974;13:211222.

5. Chou PY, Fasman GD. Prediction of the secondary structure of pro-

teins from their amino acid sequence. Adv Enzymol Relat Areas

Mol Biol 1978;47:45148.


PROTEINS 133

6. Garnier J, Osguthorpe DJ, Robson B. Analysis of the accuracy

and implications of simple methods for predicting the second-

ary structure of globular proteins. J Mol Biol 1978;120:97

120.

7. Albrecht M, Tosatto SC, Lengauer T, Valle G. Simple consensus pro-

cedures are effective and sufficient in secondary structure predic-

tion. Protein Eng 2003;16:459462.

8. Ouali M, King RD. Cascaded multiple classifiers for secondary

structure prediction. Protein Sci 2000;9:11621176.

9. Kabsch W, Sander C. How good are predictions of protein second-

ary structure? FEBS Lett 1983;155:179182.

10. Cuff JA, Barton GJ. Evaluation and improvement of multiple

sequence methods for protein secondary structure prediction. Pro-

teins 1999;34:508519.

11. Cuff JA, Barton GJ. Application of multiple sequence alignment

profiles to improve protein secondary structure prediction. Proteins

2000;40:502511.

12. Frishman D, Argos P. Seventy-five percent accuracy in protein sec-

ondary structure prediction. Proteins 1997;27:329335.

13. Rost B, Sander C. Improved prediction of protein secondary struc-

ture by use of sequence profiles and neural networks. Proc Natl

Acad Sci USA 1993;90:75587562.

14. Salamov AA, Solovyev VV. Prediction of protein secondary struc-

ture by combining nearest-neighbor algorithms and multiple

sequence alignments. J Mol Biol 1995;247:1115.

15. Wilson CL, Hubbard SJ, Doig AJ. A critical assessment of the sec-

ondary structure alpha-helices and their termini in proteins. Protein

Eng 2002;15:545554.

16. Brazhnikov EV, Efimov AV. [Structure of alpha-spiral hairpins with

short connections in globular proteins]. Mol Biol (Mosk) 2001;35:

100108 (in Russian).

17. Engel DE, DeGrado WF. Alpha-alpha linking motifs and interhelical

orientations. Proteins 2005;61:325337.

18. Lahr SJ, Engel DE, Stayrook SE, Maglio O, North B, Geremia S,

Lombardi A, DeGrado WF. Analysis and design of turns in alpha-

helical hairpins. J Mol Biol 2005;346:14411454.

19. Hobohm U, Sander C. Enlarged representative set of protein struc-

tures. Protein Sci 1994;3:522524.

20. Kabsch W, Sander C. Dictionary of protein secondary structure:

pattern recognition of hydrogen-bonded and geometrical features.

Biopolymers 1983;22:25772637.

21. Kahn PC. Defining the axis of a helix. Comput Chem 1989;13:185

189.

22. Eisenberg D, Schwarz E, Komaromy M, Wall R. Analysis of mem-

brane and surface protein sequences with the hydrophobic moment

plot. J Mol Biol 1984;179:125142.

23. Kyte J, Doolittle RF. A simple method for displaying the hydro-

pathic character of a protein. J Mol Biol 1982;157:105132.

24. Hopp TP, Woods KR. Prediction of protein antigenic determinants

from amino acid sequences. Proc Natl Acad Sci USA 1981;78:3824

3828.

25. Hubbard SJ, Beynon RJ, Thornton JM. Assessment of conforma-

tional parameters as predictors of limited proteolytic sites in native

protein structures. Protein Eng 1998;11:349359.

26. Lee B, Richards FM. The interpretation of protein structures: esti-

mation of static accessibility. J Mol Biol 1971;55:379400.

27. McDonald IK, Thornton JM. Satisfying hydrogen bonding potential

in proteins. J Mol Biol 1994;238:777793.

28. Heine A, Canaves JM, von Delft F, Brinen LS, Dai X, Deacon

AM, Elsliger MA, Eshaghi S, Floyd R, Godzik A, Grittini C,

Grzechnik SK, Guda C, Jaroszewski L, Karlak C, Klock HE, Koe-

sema E, Kovarik JS, Kreusch A, Kuhn P, Lesley SA, McMullan D,

McPhillips TM, Miller MA, Miller MD, Morse A, Moy K, Ouyang

J, Page R, Robb A, Rodrigues K, Schwarzenbacher R, Selby TL,

Spraggon G, Stevens RC, van den Bedem H, Velasquez J, Vincent

J, Wang X, West B, Wolf G, Hodgson KO, Wooley J, Wilson IA.

Crystal structure of O-acetylserine sulfhydrylase (TM0665) from

Thermotoga maritima at 1.8 A resolution. Proteins 2004;56:387

391.

29. Johnson KA, Angelucci F, Bellelli A, Herve M, Fontaine J, Tserno-

glou D, Capron A, Trottein F, Brunori M. Crystal structure of the

28 kDa glutathione S-transferase from Schistosoma haematobium.

Biochemistry 2003;42:1008410094.

30. Polekhina G, Board PG, Gali RR, Rossjohn J, Parker MW. Molecu-

lar basis of glutathione synthetase deficiency and a rare gene per-

mutation event. EMBO J 1999;18:32043213.

31. Miller DJ, Ouellette N, Evdokimova E, Savchenko A, Edwards A,

Anderson WF. Crystal complexes of a predicted S-adenosylmethio-

nine-dependent methyltransferase reveal a typical AdoMet binding

domain and a substrate recognition domain. Protein Sci 2003;12:

14321442.

32. Zhong W, Alexeev D, Harvey I, Guo M, Hunter DJ, Zhu H, Cam-

popiano DJ, Sadler PJ. Assembly of an oxo-zirconium(IV) cluster in

a protein cleft. Angew Chem Int Ed Engl 2004;43:59145918.

33. Strater N, Schnappauf G, Braus G, Lipscomb WN. Mechanisms of

catalysis and allosteric regulation of yeast chorismate mutase from

crystal structures. Structure 1997;5:14371452.

34. Fodje MN, Al-Karadaghi S. Occurrence, conformational features

and amino acid propensities for the pi-helix. Protein Eng 2002;15:

353358.

35. Karplus PA. Experimentally observed conformation-dependent geom-

etry and hidden strain in proteins. Protein Sci 1996;5:14061420.

36. Ho BK, Brasseur R. The Ramachandran plots of glycine and pre-

proline. BMC Struct Biol 2005;5:14.

37. Lovell SC, Davis IW, Arendall WB, III, de Bakker PI, Word JM, Pri-

sant MG, Richardson JS, Richardson DC. Structure validation by

Calpha geometry: phi, psi and Cbeta deviation. Proteins 2003;50:

437450.

38. Adzhubei AA, Sternberg MJ. Left-handed polyproline II helices

commonly occur in globular proteins. J Mol Biol 1993;229:472493.

39. Cubellis MV, Caillez F, Blundell TL, Lovell SC. Properties of poly-

proline II, a secondary structure element implicated in protein-pro-

tein interactions. Proteins 2005;58:880892.

40. Schellman C. The aL-conformation at the ends of helices. In: Jae-nicke R, editor. Protein folding. Amsterdam: Elsevier; 1980. pp 53

61.

41. Aurora R, Srinivasan R, Rose GD. Rules for alpha-helix termination

by glycine. Science 1994;264:11261130.

42. Cubellis MV, Cailliez F, Lovell SC. Secondary structure assignment

that accurately reflects physical and evolutionary characteristics.

BMC Bioinformatics 2005;6 (Suppl 4):S8.

43. Richardson JS, Richardson DC. Amino acid preferences for specific

locations at the ends of alpha helices. Science 1988;240:16481652.

44. Kumar S, Bansal M. Dissecting alpha-helices: position-specific anal-

ysis of alpha-helices in globular proteins. Proteins 1998;31:460476.

45. Engel DE, DeGrado WF. Amino acid propensities are position-de-

pendent throughout the length of alpha-helices. J Mol Biol

2004;337:11951205.

46. Gunasekaran K, Nagarajaram HA, Ramakrishnan C, Balaram P. Ster-

eochemical punctuation marks in protein structures: glycine and pro-

line containing helix stop signals. J Mol Biol 1998;275:917932.

47. Kumar S, Bansal M. Geometrical and sequence characteristics of

alpha-helices in globular proteins. Biophys J 1998;75:19351944.

48. Sharma V, Sharma S, Hoener zu Bentrup K, McKinney JD, Russell

DG, Jacobs WR, Jr, Sacchettini JC. Structure of isocitrate lyase

Documents

Comprehensive analysis of the helix-X-helix motif in soluble proteins